Unverified Commit e9c3c0e8 authored by Mufei Li's avatar Mufei Li Committed by GitHub
Browse files

[Model] Simplify RGCN



* Update (#5)

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* FIx

* Try

* Update

* Update

* Update

* Fix

* Update

* Fix

* Fix

* Fix

* Fix

* Update

* Fix

* Update

* Update

* Update

* Fix

* Fix

* Update

* Update

* Update

* Update

* Fix

* Fix

* Fix

* Update

* Update

* Update

* Update

* Update

* Update README.md

* Update

* Fix

* Update

* Update

* Fix

* Fix

* Fix

* Update

* Update

* Update
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-6-240.us-west-2.compute.internal>

* Update

* Update

* Fix

* Update

* Update

* Update

* Fix

* Update

* Update

* Update

* Update

* Update

* Update

* CI
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-6-240.us-west-2.compute.internal>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-57-123.us-west-2.compute.internal>
Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
parent 22272de6
...@@ -162,3 +162,5 @@ config.cmake ...@@ -162,3 +162,5 @@ config.cmake
.ycm_extra_conf.py .ycm_extra_conf.py
**.png **.png
# model file
*.pth
# Relational-GCN # Relational-GCN
* Paper: [https://arxiv.org/abs/1703.06103](https://arxiv.org/abs/1703.06103) * Paper: [Modeling Relational Data with Graph Convolutional Networks](https://arxiv.org/abs/1703.06103)
* Author's code for entity classification: [https://github.com/tkipf/relational-gcn](https://github.com/tkipf/relational-gcn) * Author's code for entity classification: [https://github.com/tkipf/relational-gcn](https://github.com/tkipf/relational-gcn)
* Author's code for link prediction: [https://github.com/MichSchli/RelationPrediction](https://github.com/MichSchli/RelationPrediction) * Author's code for link prediction: [https://github.com/MichSchli/RelationPrediction](https://github.com/MichSchli/RelationPrediction)
### Dependencies ### Dependencies
* PyTorch 0.4.1+ * PyTorch 1.10
* requests
* rdflib * rdflib
* pandas * pandas
* tqdm
* TorchMetrics
``` ```
pip install requests torch rdflib pandas pip install rdflib pandas
``` ```
Example code was tested with rdflib 4.2.2 and pandas 0.23.4 Example code was tested with rdflib 4.2.2 and pandas 0.23.4
...@@ -19,66 +20,55 @@ Example code was tested with rdflib 4.2.2 and pandas 0.23.4 ...@@ -19,66 +20,55 @@ Example code was tested with rdflib 4.2.2 and pandas 0.23.4
### Entity Classification ### Entity Classification
AIFB: accuracy 96.29% (3 runs, DGL), 95.83% (paper) AIFB: accuracy 96.29% (3 runs, DGL), 95.83% (paper)
``` ```
python3 entity_classify.py -d aifb --testing --gpu 0 python entity.py -d aifb --l2norm 0 --gpu 0
``` ```
MUTAG: accuracy 70.59% (3 runs, DGL), 73.23% (paper) MUTAG: accuracy 72.55% (3 runs, DGL), 73.23% (paper)
``` ```
python3 entity_classify.py -d mutag --l2norm 5e-4 --n-bases 30 --testing --gpu 0 python entity.py -d mutag --n-bases 30 --gpu 0
``` ```
BGS: accuracy 93.10% (3 runs, DGL), 83.10% (paper) BGS: accuracy 89.70% (3 runs, DGL), 83.10% (paper)
``` ```
python3 entity_classify.py -d bgs --l2norm 5e-4 --n-bases 40 --testing --gpu 0 python entity.py -d bgs --n-bases 40 --gpu 0
``` ```
AM: accuracy 89.22% (3 runs, DGL), 89.29% (paper) AM: accuracy 89.56% (3 runs, DGL), 89.29% (paper)
``` ```
python3 entity_classify.py -d am --n-bases=40 --n-hidden=10 --l2norm=5e-4 --testing python entity.py -d am --n-bases 40 --n-hidden 10
``` ```
### Entity Classification with minibatch ### Entity Classification with minibatch
AIFB: accuracy avg(5 runs) 90.00%, best 94.44% (DGL)
```
python3 entity_classify_mp.py -d aifb --testing --gpu 0 --fanout='20,20' --batch-size 128
```
MUTAG: accuracy avg(10 runs) 62.94%, best 72.06% (DGL)
```
python3 entity_classify_mp.py -d mutag --l2norm 5e-4 --n-bases 30 --testing --gpu 0 --batch-size 64 --fanout "-1, -1" --use-self-loop --dgl-sparse --n-epochs 20 --sparse-lr 0.01 --dropout 0.5
```
BGS: accuracy avg(5 runs) 78.62%, best 86.21% (DGL) AIFB: accuracy avg(5 runs) 91.10%, best 97.22% (DGL)
``` ```
python3 entity_classify_mp.py -d bgs --l2norm 5e-4 --n-bases 40 --testing --gpu 0 --fanout "-1, -1" --n-epochs=16 --batch-size=16 --dgl-sparse --lr 0.01 --sparse-lr 0.05 --dropout 0.3 python entity_sample.py -d aifb --l2norm 0 --gpu 0 --fanout='20,20' --batch-size 128
``` ```
AM: accuracy avg(5 runs) 87.37%, best 89.9% (DGL) MUTAG: accuracy avg(10 runs) 66.47%, best 72.06% (DGL)
``` ```
python3 entity_classify_mp.py -d am --l2norm 5e-4 --n-bases 40 --testing --gpu 0 --fanout '35,35' --batch-size 64 --n-hidden 16 --use-self-loop --n-epochs=20 --dgl-sparse --lr 0.01 --sparse-lr 0.02 --dropout 0.7 python entity_sample.py -d mutag --n-bases 30 --gpu 0 --batch-size 64 --fanout "-1, -1" --use-self-loop --n-epochs 20 --sparse-lr 0.01 --dropout 0.5
``` ```
### Entity Classification on OGBN-MAG BGS: accuracy avg(5 runs) 84.83%, best 89.66% (DGL)
Test-bd: P3-8xlarge
OGBN-MAG accuracy 45.5 (3 runs)
``` ```
python3 entity_classify_mp.py -d ogbn-mag --testing --fanout='30,30' --batch-size 1024 --n-hidden 128 --lr 0.01 --num-worker 4 --eval-batch-size 8 --low-mem --gpu 0,1,2,3 --dropout 0.7 --use-self-loop --n-bases 2 --n-epochs 3 --node-feats --dgl-sparse --sparse-lr 0.08 python entity_sample.py -d bgs --n-bases 40 --gpu 0 --fanout "-1, -1" --n-epochs=16 --batch-size=16 --sparse-lr 0.05 --dropout 0.3
``` ```
OGBN-MAG without node-feats 42.79 AM: accuracy avg(5 runs) 88.58%, best 89.90% (DGL)
``` ```
python3 entity_classify_mp.py -d ogbn-mag --testing --fanout='30,30' --batch-size 1024 --n-hidden 128 --lr 0.01 --num-worker 4 --eval-batch-size 8 --low-mem --gpu 0,1,2,3 --dropout 0.7 --use-self-loop --n-bases 2 --n-epochs 3 --dgl-sparse --sparse-lr 0.08 python entity_sample.py -d am --n-bases 40 --gpu 0 --fanout '35,35' --batch-size 64 --n-hidden 16 --use-self-loop --n-epochs=20 --sparse-lr 0.02 --dropout 0.7
``` ```
Test-bd: P2-8xlarge To use multiple GPUs, replace `entity_sample.py` with `entity_sample_multi_gpu.py` and specify
multiple GPU IDs separated by comma, e.g., `--gpu 0,1`.
### Link Prediction ### Link Prediction
FB15k-237: MRR 0.151 (DGL), 0.158 (paper) FB15k-237: MRR 0.163 (DGL), 0.158 (paper)
``` ```
python3 link_predict.py -d FB15k-237 --gpu 0 --eval-protocol raw python link.py --gpu 0 --eval-protocol raw
``` ```
FB15k-237: Filtered-MRR 0.2044 FB15k-237: Filtered-MRR 0.247
``` ```
python3 link_predict.py -d FB15k-237 --gpu 0 --eval-protocol filtered python link.py --gpu 0 --eval-protocol filtered
``` ```
"""
Differences compared to tkipf/relation-gcn
* l2norm applied to all weights
* remove nodes that won't be touched
"""
import argparse
import torch as th
import torch.nn.functional as F
from torchmetrics.functional import accuracy
from entity_utils import load_data
from model import RGCN
def main(args):
g, num_rels, num_classes, labels, train_idx, test_idx, target_idx = load_data(
args.dataset, get_norm=True)
num_nodes = g.num_nodes()
# Since the nodes are featureless, learn node embeddings from scratch
# This requires passing the node IDs to the model.
feats = th.arange(num_nodes)
model = RGCN(num_nodes,
args.n_hidden,
num_classes,
num_rels,
num_bases=args.n_bases)
if args.gpu >= 0 and th.cuda.is_available():
device = th.device(args.gpu)
else:
device = th.device('cpu')
feats = feats.to(device)
labels = labels.to(device)
model = model.to(device)
g = g.to(device)
optimizer = th.optim.Adam(model.parameters(), lr=1e-2, weight_decay=args.l2norm)
model.train()
for epoch in range(50):
logits = model(g, feats)
logits = logits[target_idx]
loss = F.cross_entropy(logits[train_idx], labels[train_idx])
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_acc = accuracy(logits[train_idx].argmax(dim=1), labels[train_idx]).item()
print("Epoch {:05d} | Train Accuracy: {:.4f} | Train Loss: {:.4f}".format(
epoch, train_acc, loss.item()))
print()
model.eval()
with th.no_grad():
logits = model(g, feats)
logits = logits[target_idx]
test_acc = accuracy(logits[test_idx].argmax(dim=1), labels[test_idx]).item()
print("Test Accuracy: {:.4f}".format(test_acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN for entity classification')
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden units")
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--n-bases", type=int, default=-1,
help="number of filter weight matrices, default: -1 [use all]")
parser.add_argument("-d", "--dataset", type=str, required=True,
choices=['aifb', 'mutag', 'bgs', 'am'],
help="dataset to use")
parser.add_argument("--l2norm", type=float, default=5e-4,
help="l2 norm coef")
args = parser.parse_args()
print(args)
main(args)
"""
Modeling Relational Data with Graph Convolutional Networks
Paper: https://arxiv.org/abs/1703.06103
Code: https://github.com/tkipf/relational-gcn
Difference compared to tkipf/relation-gcn
* l2norm applied to all weights
* remove nodes that won't be touched
"""
import argparse
import numpy as np
import time
import torch
import torch.nn.functional as F
import dgl
from dgl.nn.pytorch import RelGraphConv
from functools import partial
from dgl.data.rdf import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset
from model import BaseRGCN
class EntityClassify(BaseRGCN):
def create_features(self):
features = torch.arange(self.num_nodes)
if self.use_cuda:
features = features.cuda()
return features
def build_input_layer(self):
return RelGraphConv(self.num_nodes, self.h_dim, self.num_rels, "basis",
self.num_bases, activation=F.relu, self_loop=self.use_self_loop,
dropout=self.dropout)
def build_hidden_layer(self, idx):
return RelGraphConv(self.h_dim, self.h_dim, self.num_rels, "basis",
self.num_bases, activation=F.relu, self_loop=self.use_self_loop,
dropout=self.dropout)
def build_output_layer(self):
return RelGraphConv(self.h_dim, self.out_dim, self.num_rels, "basis",
self.num_bases, activation=None,
self_loop=self.use_self_loop)
def main(args):
# load graph data
if args.dataset == 'aifb':
dataset = AIFBDataset()
elif args.dataset == 'mutag':
dataset = MUTAGDataset()
elif args.dataset == 'bgs':
dataset = BGSDataset()
elif args.dataset == 'am':
dataset = AMDataset()
else:
raise ValueError()
# Load from hetero-graph
hg = dataset[0]
num_rels = len(hg.canonical_etypes)
category = dataset.predict_category
num_classes = dataset.num_classes
train_mask = hg.nodes[category].data.pop('train_mask')
test_mask = hg.nodes[category].data.pop('test_mask')
train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze()
test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze()
labels = hg.nodes[category].data.pop('labels')
# split dataset into train, validate, test
if args.validation:
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]
else:
val_idx = train_idx
# calculate norm for each edge type and store in edge
for canonical_etype in hg.canonical_etypes:
u, v, eid = hg.all_edges(form='all', etype=canonical_etype)
_, inverse_index, count = torch.unique(v, return_inverse=True, return_counts=True)
degrees = count[inverse_index]
norm = torch.ones(eid.shape[0]).float() / degrees.float()
norm = norm.unsqueeze(1)
hg.edges[canonical_etype].data['norm'] = norm
# get target category id
category_id = len(hg.ntypes)
for i, ntype in enumerate(hg.ntypes):
if ntype == category:
category_id = i
g = dgl.to_homogeneous(hg, edata=['norm'])
num_nodes = g.number_of_nodes()
node_ids = torch.arange(num_nodes)
edge_norm = g.edata['norm']
edge_type = g.edata[dgl.ETYPE].long()
# find out the target node ids in g
node_tids = g.ndata[dgl.NTYPE]
loc = (node_tids == category_id)
target_idx = node_ids[loc]
# since the nodes are featureless, the input feature is then the node id.
feats = torch.arange(num_nodes)
# check cuda
use_cuda = args.gpu >= 0 and torch.cuda.is_available()
if use_cuda:
torch.cuda.set_device(args.gpu)
feats = feats.cuda()
edge_type = edge_type.cuda()
edge_norm = edge_norm.cuda()
labels = labels.cuda()
# create model
model = EntityClassify(num_nodes,
args.n_hidden,
num_classes,
num_rels,
num_bases=args.n_bases,
num_hidden_layers=args.n_layers - 2,
dropout=args.dropout,
use_self_loop=args.use_self_loop,
use_cuda=use_cuda)
if use_cuda:
model.cuda()
g = g.to('cuda:%d' % args.gpu)
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.l2norm)
# training loop
print("start training...")
forward_time = []
backward_time = []
model.train()
for epoch in range(args.n_epochs):
optimizer.zero_grad()
t0 = time.time()
logits = model(g, feats, edge_type, edge_norm)
logits = logits[target_idx]
loss = F.cross_entropy(logits[train_idx], labels[train_idx])
t1 = time.time()
loss.backward()
optimizer.step()
t2 = time.time()
forward_time.append(t1 - t0)
backward_time.append(t2 - t1)
print("Epoch {:05d} | Train Forward Time(s) {:.4f} | Backward Time(s) {:.4f}".
format(epoch, forward_time[-1], backward_time[-1]))
train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx]).item() / len(train_idx)
val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx]).item() / len(val_idx)
print("Train Accuracy: {:.4f} | Train Loss: {:.4f} | Validation Accuracy: {:.4f} | Validation loss: {:.4f}".
format(train_acc, loss.item(), val_acc, val_loss.item()))
print()
model.eval()
logits = model.forward(g, feats, edge_type, edge_norm)
logits = logits[target_idx]
test_loss = F.cross_entropy(logits[test_idx], labels[test_idx])
test_acc = torch.sum(logits[test_idx].argmax(dim=1) == labels[test_idx]).item() / len(test_idx)
print("Test Accuracy: {:.4f} | Test loss: {:.4f}".format(test_acc, test_loss.item()))
print()
print("Mean forward time: {:4f}".format(np.mean(forward_time[len(forward_time) // 4:])))
print("Mean backward time: {:4f}".format(np.mean(backward_time[len(backward_time) // 4:])))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN')
parser.add_argument("--dropout", type=float, default=0,
help="dropout probability")
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden units")
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--lr", type=float, default=1e-2,
help="learning rate")
parser.add_argument("--n-bases", type=int, default=-1,
help="number of filter weight matrices, default: -1 [use all]")
parser.add_argument("--n-layers", type=int, default=2,
help="number of propagation rounds")
parser.add_argument("-e", "--n-epochs", type=int, default=50,
help="number of training epochs")
parser.add_argument("-d", "--dataset", type=str, required=True,
help="dataset to use")
parser.add_argument("--l2norm", type=float, default=0,
help="l2 norm coef")
parser.add_argument("--use-self-loop", default=False, action='store_true',
help="include self feature as a special relation")
fp = parser.add_mutually_exclusive_group(required=False)
fp.add_argument('--validation', dest='validation', action='store_true')
fp.add_argument('--testing', dest='validation', action='store_false')
parser.set_defaults(validation=True)
args = parser.parse_args()
print(args)
main(args)
This diff is collapsed.
"""
Differences compared to tkipf/relation-gcn
* l2norm applied to all weights
* remove nodes that won't be touched
"""
import argparse
import torch as th
import torch.nn.functional as F
import dgl
from dgl.dataloading import MultiLayerNeighborSampler, NodeDataLoader
from torchmetrics.functional import accuracy
from tqdm import tqdm
from entity_utils import load_data
from model import RelGraphEmbedLayer, RGCN
def init_dataloaders(args, g, train_idx, test_idx, target_idx, device, use_ddp=False):
fanouts = [int(fanout) for fanout in args.fanout.split(',')]
sampler = MultiLayerNeighborSampler(fanouts)
train_loader = NodeDataLoader(
g,
target_idx[train_idx],
sampler,
use_ddp=use_ddp,
device=device,
batch_size=args.batch_size,
shuffle=True,
drop_last=False)
# The datasets do not have a validation subset, use the train subset
val_loader = NodeDataLoader(
g,
target_idx[train_idx],
sampler,
use_ddp=use_ddp,
device=device,
batch_size=args.batch_size,
shuffle=False,
drop_last=False)
# -1 for sampling all neighbors
test_sampler = MultiLayerNeighborSampler([-1] * len(fanouts))
test_loader = NodeDataLoader(
g,
target_idx[test_idx],
test_sampler,
use_ddp=use_ddp,
device=device,
batch_size=32,
shuffle=False,
drop_last=False)
return train_loader, val_loader, test_loader
def init_models(args, device, num_nodes, num_classes, num_rels):
embed_layer = RelGraphEmbedLayer(device,
num_nodes,
args.n_hidden)
model = RGCN(args.n_hidden,
args.n_hidden,
num_classes,
num_rels,
num_bases=args.n_bases,
dropout=args.dropout,
self_loop=args.use_self_loop)
return embed_layer, model
def process_batch(inv_target, batch):
_, seeds, blocks = batch
# map the seed nodes back to their type-specific ids,
# in order to get the target node labels
seeds = inv_target[seeds]
for blc in blocks:
blc.edata['norm'] = dgl.norm_by_dst(blc).unsqueeze(1)
return seeds, blocks
def train(model, embed_layer, train_loader, inv_target,
labels, emb_optimizer, optimizer):
model.train()
embed_layer.train()
for sample_data in train_loader:
seeds, blocks = process_batch(inv_target, sample_data)
feats = embed_layer(blocks[0].srcdata[dgl.NID].cpu())
logits = model(blocks, feats)
loss = F.cross_entropy(logits, labels[seeds])
emb_optimizer.zero_grad()
optimizer.zero_grad()
loss.backward()
emb_optimizer.step()
optimizer.step()
train_acc = accuracy(logits.argmax(dim=1), labels[seeds]).item()
return train_acc, loss.item()
def evaluate(model, embed_layer, eval_loader, inv_target):
model.eval()
embed_layer.eval()
eval_logits = []
eval_seeds = []
with th.no_grad():
for sample_data in tqdm(eval_loader):
seeds, blocks = process_batch(inv_target, sample_data)
feats = embed_layer(blocks[0].srcdata[dgl.NID].cpu())
logits = model(blocks, feats)
eval_logits.append(logits.cpu().detach())
eval_seeds.append(seeds.cpu().detach())
eval_logits = th.cat(eval_logits)
eval_seeds = th.cat(eval_seeds)
return eval_logits, eval_seeds
def main(args):
g, num_rels, num_classes, labels, train_idx, test_idx, target_idx, inv_target = load_data(
args.dataset, inv_target=True)
if args.gpu >= 0 and th.cuda.is_available():
device = th.device(args.gpu)
else:
device = th.device('cpu')
train_loader, val_loader, test_loader = init_dataloaders(
args, g, train_idx, test_idx, target_idx, args.gpu)
embed_layer, model = init_models(args, device, g.num_nodes(), num_classes, num_rels)
labels = labels.to(device)
model = model.to(device)
emb_optimizer = th.optim.SparseAdam(embed_layer.parameters(), lr=args.sparse_lr)
optimizer = th.optim.Adam(model.parameters(), lr=1e-2, weight_decay=args.l2norm)
for epoch in range(args.n_epochs):
train_acc, loss = train(model, embed_layer, train_loader, inv_target,
labels, emb_optimizer, optimizer)
print("Epoch {:05d}/{:05d} | Train Accuracy: {:.4f} | Train Loss: {:.4f}".format(
epoch, args.n_epochs, train_acc, loss))
val_logits, val_seeds = evaluate(model, embed_layer, val_loader, inv_target)
val_acc = accuracy(val_logits.argmax(dim=1), labels[val_seeds].cpu()).item()
print("Validation Accuracy: {:.4f}".format(val_acc))
test_logits, test_seeds = evaluate(model, embed_layer,
test_loader, inv_target)
test_acc = accuracy(test_logits.argmax(dim=1), labels[test_seeds].cpu()).item()
print("Final Test Accuracy: {:.4f}".format(test_acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN for entity classification with sampling')
parser.add_argument("--dropout", type=float, default=0,
help="dropout probability")
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden units")
parser.add_argument("--gpu", type=int, default=0,
help="gpu")
parser.add_argument("--sparse-lr", type=float, default=2e-2,
help="sparse embedding learning rate")
parser.add_argument("--n-bases", type=int, default=-1,
help="number of filter weight matrices, default: -1 [use all]")
parser.add_argument("--n-epochs", type=int, default=50,
help="number of training epochs")
parser.add_argument("-d", "--dataset", type=str, required=True,
choices=['aifb', 'mutag', 'bgs', 'am'],
help="dataset to use")
parser.add_argument("--l2norm", type=float, default=5e-4,
help="l2 norm coef")
parser.add_argument("--fanout", type=str, default="4, 4",
help="Fan-out of neighbor sampling")
parser.add_argument("--use-self-loop", default=False, action='store_true',
help="include self feature as a special relation")
parser.add_argument("--batch-size", type=int, default=100,
help="Mini-batch size")
args = parser.parse_args()
print(args)
main(args)
"""
Differences compared to tkipf/relation-gcn
* l2norm applied to all weights
* remove nodes that won't be touched
"""
import argparse
import gc
import torch as th
import torch.nn.functional as F
import dgl.multiprocessing as mp
import dgl
from torchmetrics.functional import accuracy
from torch.nn.parallel import DistributedDataParallel
from entity_utils import load_data
from entity_sample import init_dataloaders, init_models, train, evaluate
def collect_eval(n_gpus, queue, labels):
eval_logits = []
eval_seeds = []
for _ in range(n_gpus):
eval_l, eval_s = queue.get()
eval_logits.append(eval_l)
eval_seeds.append(eval_s)
eval_logits = th.cat(eval_logits)
eval_seeds = th.cat(eval_seeds)
eval_acc = accuracy(eval_logits.argmax(dim=1), labels[eval_seeds].cpu()).item()
return eval_acc
def run(proc_id, n_gpus, n_cpus, args, devices, dataset, queue=None):
dev_id = devices[proc_id]
g, num_classes, num_rels, target_idx, inv_target, train_idx,\
test_idx, labels = dataset
dist_init_method = 'tcp://{master_ip}:{master_port}'.format(
master_ip='127.0.0.1', master_port='12345')
backend = 'gloo'
if proc_id == 0:
print("backend using {}".format(backend))
th.distributed.init_process_group(backend=backend,
init_method=dist_init_method,
world_size=n_gpus,
rank=proc_id)
device = th.device(dev_id)
use_ddp = True if n_gpus > 1 else False
train_loader, val_loader, test_loader = init_dataloaders(
args, g, train_idx, test_idx, target_idx, dev_id, use_ddp=use_ddp)
embed_layer, model = init_models(args, device, g.num_nodes(), num_classes, num_rels)
labels = labels.to(device)
model = model.to(device)
model = DistributedDataParallel(model, device_ids=[dev_id], output_device=dev_id)
embed_layer = DistributedDataParallel(embed_layer, device_ids=None, output_device=None)
emb_optimizer = th.optim.SparseAdam(embed_layer.module.parameters(), lr=args.sparse_lr)
optimizer = th.optim.Adam(model.parameters(), lr=1e-2, weight_decay=args.l2norm)
th.set_num_threads(n_cpus)
for epoch in range(args.n_epochs):
train_loader.set_epoch(epoch)
train_acc, loss = train(model, embed_layer, train_loader, inv_target,
labels, emb_optimizer, optimizer)
if proc_id == 0:
print("Epoch {:05d}/{:05d} | Train Accuracy: {:.4f} | Train Loss: {:.4f}".format(
epoch, args.n_epochs, train_acc, loss))
# garbage collection that empties the queue
gc.collect()
val_logits, val_seeds = evaluate(model, embed_layer, val_loader, inv_target)
queue.put((val_logits, val_seeds))
# gather evaluation result from multiple processes
if proc_id == 0:
val_acc = collect_eval(n_gpus, queue, labels)
print("Validation Accuracy: {:.4f}".format(val_acc))
# garbage collection that empties the queue
gc.collect()
test_logits, test_seeds = evaluate(model, embed_layer, test_loader, inv_target)
queue.put((test_logits, test_seeds))
if proc_id == 0:
test_acc = collect_eval(n_gpus, queue, labels)
print("Final Test Accuracy: {:.4f}".format(test_acc))
th.distributed.barrier()
def main(args, devices):
g, num_rels, num_classes, labels, train_idx, test_idx, target_idx, inv_target = load_data(
args.dataset, inv_target=True)
# Create csr/coo/csc formats before launching training processes.
# This avoids creating certain formats in each sub-process, which saves momory and CPU.
g.create_formats_()
n_gpus = len(devices)
n_cpus = mp.cpu_count()
queue = mp.Queue(n_gpus)
procs = []
for proc_id in range(n_gpus):
# We use distributed data parallel dataloader to handle the data splitting
p = mp.Process(target=run, args=(proc_id, n_gpus, n_cpus // n_gpus, args, devices,
(g, num_classes, num_rels, target_idx,
inv_target, train_idx, test_idx, labels),
queue))
p.start()
procs.append(p)
for p in procs:
p.join()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN for entity classification with sampling and multiple gpus')
parser.add_argument("--dropout", type=float, default=0,
help="dropout probability")
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden units")
parser.add_argument("--gpu", type=str, default='0',
help="gpu")
parser.add_argument("--sparse-lr", type=float, default=2e-2,
help="sparse embedding learning rate")
parser.add_argument("--n-bases", type=int, default=-1,
help="number of filter weight matrices, default: -1 [use all]")
parser.add_argument("--n-epochs", type=int, default=50,
help="number of training epochs")
parser.add_argument("-d", "--dataset", type=str, required=True,
choices=['aifb', 'mutag', 'bgs', 'am'],
help="dataset to use")
parser.add_argument("--l2norm", type=float, default=5e-4,
help="l2 norm coef")
parser.add_argument("--fanout", type=str, default="4, 4",
help="Fan-out of neighbor sampling")
parser.add_argument("--use-self-loop", default=False, action='store_true',
help="include self feature as a special relation")
parser.add_argument("--batch-size", type=int, default=100,
help="Mini-batch size. ")
args = parser.parse_args()
devices = list(map(int, args.gpu.split(',')))
print(args)
main(args, devices)
import dgl
import torch as th
from dgl.data.rdf import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset
def load_data(data_name, get_norm=False, inv_target=False):
if data_name == 'aifb':
dataset = AIFBDataset()
elif data_name == 'mutag':
dataset = MUTAGDataset()
elif data_name == 'bgs':
dataset = BGSDataset()
else:
dataset = AMDataset()
# Load hetero-graph
hg = dataset[0]
num_rels = len(hg.canonical_etypes)
category = dataset.predict_category
num_classes = dataset.num_classes
labels = hg.nodes[category].data.pop('labels')
train_mask = hg.nodes[category].data.pop('train_mask')
test_mask = hg.nodes[category].data.pop('test_mask')
train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
if get_norm:
# Calculate normalization weight for each edge,
# 1. / d, d is the degree of the destination node
for cetype in hg.canonical_etypes:
hg.edges[cetype].data['norm'] = dgl.norm_by_dst(hg, cetype).unsqueeze(1)
edata = ['norm']
else:
edata = None
# get target category id
category_id = hg.ntypes.index(category)
g = dgl.to_homogeneous(hg, edata=edata)
# Rename the fields as they can be changed by for example NodeDataLoader
g.ndata['ntype'] = g.ndata.pop(dgl.NTYPE)
g.ndata['type_id'] = g.ndata.pop(dgl.NID)
node_ids = th.arange(g.num_nodes())
# find out the target node ids in g
loc = (g.ndata['ntype'] == category_id)
target_idx = node_ids[loc]
if inv_target:
# Map global node IDs to type-specific node IDs. This is required for
# looking up type-specific labels in a minibatch
inv_target = th.empty((g.num_nodes(),), dtype=th.int64)
inv_target[target_idx] = th.arange(0, target_idx.shape[0],
dtype=inv_target.dtype)
return g, num_rels, num_classes, labels, train_idx, test_idx, target_idx, inv_target
else:
return g, num_rels, num_classes, labels, train_idx, test_idx, target_idx
"""
Differences compared to MichSchli/RelationPrediction
* Report raw metrics instead of filtered metrics.
* By default, we use uniform edge sampling instead of neighbor-based edge
sampling used in author's code. In practice, we find it achieves similar MRR.
"""
import argparse
import torch as th
import torch.nn as nn
import torch.nn.functional as F
from dgl.data.knowledge_graph import FB15k237Dataset
from dgl.dataloading import GraphDataLoader
from link_utils import preprocess, SubgraphIterator, calc_mrr
from model import RGCN
class LinkPredict(nn.Module):
def __init__(self, in_dim, num_rels, h_dim=500, num_bases=100, dropout=0.2, reg_param=0.01):
super(LinkPredict, self).__init__()
self.rgcn = RGCN(in_dim, h_dim, h_dim, num_rels * 2, regularizer="bdd",
num_bases=num_bases, dropout=dropout, self_loop=True, link_pred=True)
self.reg_param = reg_param
self.w_relation = nn.Parameter(th.Tensor(num_rels, h_dim))
nn.init.xavier_uniform_(self.w_relation,
gain=nn.init.calculate_gain('relu'))
def calc_score(self, embedding, triplets):
# DistMult
s = embedding[triplets[:,0]]
r = self.w_relation[triplets[:,1]]
o = embedding[triplets[:,2]]
score = th.sum(s * r * o, dim=1)
return score
def forward(self, g, h):
return self.rgcn(g, h)
def regularization_loss(self, embedding):
return th.mean(embedding.pow(2)) + th.mean(self.w_relation.pow(2))
def get_loss(self, embed, triplets, labels):
# each row in the triplets is a 3-tuple of (source, relation, destination)
score = self.calc_score(embed, triplets)
predict_loss = F.binary_cross_entropy_with_logits(score, labels)
reg_loss = self.regularization_loss(embed)
return predict_loss + self.reg_param * reg_loss
def main(args):
data = FB15k237Dataset(reverse=False)
graph = data[0]
num_nodes = graph.num_nodes()
num_rels = data.num_rels
train_g, test_g = preprocess(graph, num_rels)
test_node_id = th.arange(0, num_nodes).view(-1, 1)
test_mask = graph.edata['test_mask']
subg_iter = SubgraphIterator(train_g, num_rels, args.edge_sampler)
dataloader = GraphDataLoader(subg_iter, batch_size=1, collate_fn=lambda x: x[0])
# Prepare data for metric computation
src, dst = graph.edges()
triplets = th.stack([src, graph.edata['etype'], dst], dim=1)
model = LinkPredict(num_nodes, num_rels)
optimizer = th.optim.Adam(model.parameters(), lr=1e-2)
if args.gpu >= 0 and th.cuda.is_available():
device = th.device(args.gpu)
else:
device = th.device('cpu')
model = model.to(device)
best_mrr = 0
model_state_file = 'model_state.pth'
for epoch, batch_data in enumerate(dataloader):
model.train()
g, node_id, data, labels = batch_data
g = g.to(device)
node_id = node_id.to(device)
data = data.to(device)
labels = labels.to(device)
embed = model(g, node_id)
loss = model.get_loss(embed, data, labels)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # clip gradients
optimizer.step()
print("Epoch {:04d} | Loss {:.4f} | Best MRR {:.4f}".format(epoch, loss.item(), best_mrr))
if (epoch + 1) % 500 == 0:
# perform validation on CPU because full graph is too large
model = model.cpu()
model.eval()
print("start eval")
embed = model(test_g, test_node_id)
mrr = calc_mrr(embed, model.w_relation, test_mask, triplets,
batch_size=500, eval_p=args.eval_protocol)
# save best model
if best_mrr < mrr:
best_mrr = mrr
th.save({'state_dict': model.state_dict(), 'epoch': epoch}, model_state_file)
model = model.to(device)
print("Start testing:")
# use best model checkpoint
checkpoint = th.load(model_state_file)
model = model.cpu() # test on CPU
model.eval()
model.load_state_dict(checkpoint['state_dict'])
print("Using best epoch: {}".format(checkpoint['epoch']))
embed = model(test_g, test_node_id)
calc_mrr(embed, model.w_relation, test_mask, triplets,
batch_size=500, eval_p=args.eval_protocol)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN for link prediction')
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--eval-protocol", type=str, default='filtered',
choices=['filtered', 'raw'],
help="Whether to use 'filtered' or 'raw' MRR for evaluation")
parser.add_argument("--edge-sampler", type=str, default='uniform',
choices=['uniform', 'neighbor'],
help="Type of edge sampler: 'uniform' or 'neighbor'"
"The original implementation uses neighbor sampler.")
args = parser.parse_args()
print(args)
main(args)
"""
Modeling Relational Data with Graph Convolutional Networks
Paper: https://arxiv.org/abs/1703.06103
Code: https://github.com/MichSchli/RelationPrediction
Difference compared to MichSchli/RelationPrediction
* Report raw metrics instead of filtered metrics.
* By default, we use uniform edge sampling instead of neighbor-based edge
sampling used in author's code. In practice, we find it achieves similar MRR. User could specify "--edge-sampler=neighbor" to switch
to neighbor-based edge sampling.
"""
import argparse
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
from dgl.data.knowledge_graph import load_data
from dgl.nn.pytorch import RelGraphConv
from model import BaseRGCN
import utils
class EmbeddingLayer(nn.Module):
def __init__(self, num_nodes, h_dim):
super(EmbeddingLayer, self).__init__()
self.embedding = torch.nn.Embedding(num_nodes, h_dim)
def forward(self, g, h, r, norm):
return self.embedding(h.squeeze())
class RGCN(BaseRGCN):
def build_input_layer(self):
return EmbeddingLayer(self.num_nodes, self.h_dim)
def build_hidden_layer(self, idx):
act = F.relu if idx < self.num_hidden_layers - 1 else None
return RelGraphConv(self.h_dim, self.h_dim, self.num_rels, "bdd",
self.num_bases, activation=act, self_loop=True,
dropout=self.dropout)
class LinkPredict(nn.Module):
def __init__(self, in_dim, h_dim, num_rels, num_bases=-1,
num_hidden_layers=1, dropout=0, use_cuda=False, reg_param=0):
super(LinkPredict, self).__init__()
self.rgcn = RGCN(in_dim, h_dim, h_dim, num_rels * 2, num_bases,
num_hidden_layers, dropout, use_cuda)
self.reg_param = reg_param
self.w_relation = nn.Parameter(torch.Tensor(num_rels, h_dim))
nn.init.xavier_uniform_(self.w_relation,
gain=nn.init.calculate_gain('relu'))
def calc_score(self, embedding, triplets):
# DistMult
s = embedding[triplets[:,0]]
r = self.w_relation[triplets[:,1]]
o = embedding[triplets[:,2]]
score = torch.sum(s * r * o, dim=1)
return score
def forward(self, g, h, r, norm):
return self.rgcn.forward(g, h, r, norm)
def regularization_loss(self, embedding):
return torch.mean(embedding.pow(2)) + torch.mean(self.w_relation.pow(2))
def get_loss(self, g, embed, triplets, labels):
# triplets is a list of data samples (positive and negative)
# each row in the triplets is a 3-tuple of (source, relation, destination)
score = self.calc_score(embed, triplets)
predict_loss = F.binary_cross_entropy_with_logits(score, labels)
reg_loss = self.regularization_loss(embed)
return predict_loss + self.reg_param * reg_loss
def node_norm_to_edge_norm(g, node_norm):
g = g.local_var()
# convert to edge norm
g.ndata['norm'] = node_norm
g.apply_edges(lambda edges : {'norm' : edges.dst['norm']})
return g.edata['norm']
def main(args):
# load graph data
data = load_data(args.dataset)
num_nodes = data.num_nodes
train_data = data.train
valid_data = data.valid
test_data = data.test
num_rels = data.num_rels
# check cuda
use_cuda = args.gpu >= 0 and torch.cuda.is_available()
if use_cuda:
torch.cuda.set_device(args.gpu)
# create model
model = LinkPredict(num_nodes,
args.n_hidden,
num_rels,
num_bases=args.n_bases,
num_hidden_layers=args.n_layers,
dropout=args.dropout,
use_cuda=use_cuda,
reg_param=args.regularization)
# validation and testing triplets
valid_data = torch.LongTensor(valid_data)
test_data = torch.LongTensor(test_data)
# build test graph
test_graph, test_rel, test_norm = utils.build_test_graph(
num_nodes, num_rels, train_data)
test_deg = test_graph.in_degrees(
range(test_graph.number_of_nodes())).float().view(-1,1)
test_node_id = torch.arange(0, num_nodes, dtype=torch.long).view(-1, 1)
test_rel = torch.from_numpy(test_rel)
test_norm = node_norm_to_edge_norm(test_graph, torch.from_numpy(test_norm).view(-1, 1))
if use_cuda:
model.cuda()
# build adj list and calculate degrees for sampling
adj_list, degrees = utils.get_adj_and_degrees(num_nodes, train_data)
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
model_state_file = 'model_state.pth'
forward_time = []
backward_time = []
# training loop
print("start training...")
epoch = 0
best_mrr = 0
while True:
model.train()
epoch += 1
# perform edge neighborhood sampling to generate training graph and data
g, node_id, edge_type, node_norm, data, labels = \
utils.generate_sampled_graph_and_labels(
train_data, args.graph_batch_size, args.graph_split_size,
num_rels, adj_list, degrees, args.negative_sample,
args.edge_sampler)
print("Done edge sampling")
# set node/edge feature
node_id = torch.from_numpy(node_id).view(-1, 1).long()
edge_type = torch.from_numpy(edge_type)
edge_norm = node_norm_to_edge_norm(g, torch.from_numpy(node_norm).view(-1, 1))
data, labels = torch.from_numpy(data), torch.from_numpy(labels)
deg = g.in_degrees(range(g.number_of_nodes())).float().view(-1, 1)
if use_cuda:
node_id, deg = node_id.cuda(), deg.cuda()
edge_type, edge_norm = edge_type.cuda(), edge_norm.cuda()
data, labels = data.cuda(), labels.cuda()
g = g.to(args.gpu)
t0 = time.time()
embed = model(g, node_id, edge_type, edge_norm)
loss = model.get_loss(g, embed, data, labels)
t1 = time.time()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), args.grad_norm) # clip gradients
optimizer.step()
t2 = time.time()
forward_time.append(t1 - t0)
backward_time.append(t2 - t1)
print("Epoch {:04d} | Loss {:.4f} | Best MRR {:.4f} | Forward {:.4f}s | Backward {:.4f}s".
format(epoch, loss.item(), best_mrr, forward_time[-1], backward_time[-1]))
optimizer.zero_grad()
# validation
if epoch % args.evaluate_every == 0:
# perform validation on CPU because full graph is too large
if use_cuda:
model.cpu()
model.eval()
print("start eval")
embed = model(test_graph, test_node_id, test_rel, test_norm)
mrr = utils.calc_mrr(embed, model.w_relation, torch.LongTensor(train_data),
valid_data, test_data, hits=[1, 3, 10], eval_bz=args.eval_batch_size,
eval_p=args.eval_protocol)
# save best model
if best_mrr < mrr:
best_mrr = mrr
torch.save({'state_dict': model.state_dict(), 'epoch': epoch}, model_state_file)
if epoch >= args.n_epochs:
break
if use_cuda:
model.cuda()
print("training done")
print("Mean forward time: {:4f}s".format(np.mean(forward_time)))
print("Mean Backward time: {:4f}s".format(np.mean(backward_time)))
print("\nstart testing:")
# use best model checkpoint
checkpoint = torch.load(model_state_file)
if use_cuda:
model.cpu() # test on CPU
model.eval()
model.load_state_dict(checkpoint['state_dict'])
print("Using best epoch: {}".format(checkpoint['epoch']))
embed = model(test_graph, test_node_id, test_rel, test_norm)
utils.calc_mrr(embed, model.w_relation, torch.LongTensor(train_data), valid_data,
test_data, hits=[1, 3, 10], eval_bz=args.eval_batch_size, eval_p=args.eval_protocol)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='RGCN')
parser.add_argument("--dropout", type=float, default=0.2,
help="dropout probability")
parser.add_argument("--n-hidden", type=int, default=500,
help="number of hidden units")
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--lr", type=float, default=1e-2,
help="learning rate")
parser.add_argument("--n-bases", type=int, default=100,
help="number of weight blocks for each relation")
parser.add_argument("--n-layers", type=int, default=2,
help="number of propagation rounds")
parser.add_argument("--n-epochs", type=int, default=6000,
help="number of minimum training epochs")
parser.add_argument("-d", "--dataset", type=str, required=True,
help="dataset to use")
parser.add_argument("--eval-batch-size", type=int, default=500,
help="batch size when evaluating")
parser.add_argument("--eval-protocol", type=str, default="filtered",
help="type of evaluation protocol: 'raw' or 'filtered' mrr")
parser.add_argument("--regularization", type=float, default=0.01,
help="regularization weight")
parser.add_argument("--grad-norm", type=float, default=1.0,
help="norm to clip gradient to")
parser.add_argument("--graph-batch-size", type=int, default=30000,
help="number of edges to sample in each iteration")
parser.add_argument("--graph-split-size", type=float, default=0.5,
help="portion of edges used as positive sample")
parser.add_argument("--negative-sample", type=int, default=10,
help="number of negative samples per positive sample")
parser.add_argument("--evaluate-every", type=int, default=500,
help="perform evaluation every n epochs")
parser.add_argument("--edge-sampler", type=str, default="uniform",
help="type of edge sampler: 'uniform' or 'neighbor'")
args = parser.parse_args()
print(args)
main(args)
"""
Utility functions for link prediction
Most code is adapted from authors' implementation of RGCN link prediction:
https://github.com/MichSchli/RelationPrediction
"""
import numpy as np
import torch as th
import dgl
# Utility function for building training and testing graphs
def get_subset_g(g, mask, num_rels, bidirected=False):
src, dst = g.edges()
sub_src = src[mask]
sub_dst = dst[mask]
sub_rel = g.edata['etype'][mask]
if bidirected:
sub_src, sub_dst = th.cat([sub_src, sub_dst]), th.cat([sub_dst, sub_src])
sub_rel = th.cat([sub_rel, sub_rel + num_rels])
sub_g = dgl.graph((sub_src, sub_dst), num_nodes=g.num_nodes())
sub_g.edata[dgl.ETYPE] = sub_rel
return sub_g
def preprocess(g, num_rels):
# Get train graph
train_g = get_subset_g(g, g.edata['train_mask'], num_rels)
# Get test graph
test_g = get_subset_g(g, g.edata['train_mask'], num_rels, bidirected=True)
test_g.edata['norm'] = dgl.norm_by_dst(test_g).unsqueeze(-1)
return train_g, test_g
class GlobalUniform:
def __init__(self, g, sample_size):
self.sample_size = sample_size
self.eids = np.arange(g.num_edges())
def sample(self):
return th.from_numpy(np.random.choice(self.eids, self.sample_size))
class NeighborExpand:
"""Sample a connected component by neighborhood expansion"""
def __init__(self, g, sample_size):
self.g = g
self.nids = np.arange(g.num_nodes())
self.sample_size = sample_size
def sample(self):
edges = th.zeros((self.sample_size), dtype=th.int64)
neighbor_counts = (self.g.in_degrees() + self.g.out_degrees()).numpy()
seen_edge = np.array([False] * self.g.num_edges())
seen_node = np.array([False] * self.g.num_nodes())
for i in range(self.sample_size):
if np.sum(seen_node) == 0:
node_weights = np.ones_like(neighbor_counts)
node_weights[np.where(neighbor_counts == 0)] = 0
else:
# Sample a visited node if applicable.
# This guarantees a connected component.
node_weights = neighbor_counts * seen_node
node_probs = node_weights / np.sum(node_weights)
chosen_node = np.random.choice(self.nids, p=node_probs)
# Sample a neighbor of the sampled node
u1, v1, eid1 = self.g.in_edges(chosen_node, form='all')
u2, v2, eid2 = self.g.out_edges(chosen_node, form='all')
u = th.cat([u1, u2])
v = th.cat([v1, v2])
eid = th.cat([eid1, eid2])
to_pick = True
while to_pick:
random_id = th.randint(high=eid.shape[0], size=(1,))
chosen_eid = eid[random_id]
to_pick = seen_edge[chosen_eid]
chosen_u = u[random_id]
chosen_v = v[random_id]
edges[i] = chosen_eid
seen_node[chosen_u] = True
seen_node[chosen_v] = True
seen_edge[chosen_eid] = True
neighbor_counts[chosen_u] -= 1
neighbor_counts[chosen_v] -= 1
return edges
class NegativeSampler:
def __init__(self, k=10):
self.k = k
def sample(self, pos_samples, num_nodes):
batch_size = len(pos_samples)
neg_batch_size = batch_size * self.k
neg_samples = np.tile(pos_samples, (self.k, 1))
values = np.random.randint(num_nodes, size=neg_batch_size)
choices = np.random.uniform(size=neg_batch_size)
subj = choices > 0.5
obj = choices <= 0.5
neg_samples[subj, 0] = values[subj]
neg_samples[obj, 2] = values[obj]
samples = np.concatenate((pos_samples, neg_samples))
# binary labels indicating positive and negative samples
labels = np.zeros(batch_size * (self.k + 1), dtype=np.float32)
labels[:batch_size] = 1
return th.from_numpy(samples), th.from_numpy(labels)
class SubgraphIterator:
def __init__(self, g, num_rels, pos_sampler, sample_size=30000, num_epochs=6000):
self.g = g
self.num_rels = num_rels
self.sample_size = sample_size
self.num_epochs = num_epochs
if pos_sampler == 'neighbor':
self.pos_sampler = NeighborExpand(g, sample_size)
else:
self.pos_sampler = GlobalUniform(g, sample_size)
self.neg_sampler = NegativeSampler()
def __len__(self):
return self.num_epochs
def __getitem__(self, i):
eids = self.pos_sampler.sample()
src, dst = self.g.find_edges(eids)
src, dst = src.numpy(), dst.numpy()
rel = self.g.edata[dgl.ETYPE][eids].numpy()
# relabel nodes to have consecutive node IDs
uniq_v, edges = np.unique((src, dst), return_inverse=True)
num_nodes = len(uniq_v)
# edges is the concatenation of src, dst with relabeled ID
src, dst = np.reshape(edges, (2, -1))
relabeled_data = np.stack((src, rel, dst)).transpose()
samples, labels = self.neg_sampler.sample(relabeled_data, num_nodes)
# Use only half of the positive edges
chosen_ids = np.random.choice(np.arange(self.sample_size),
size=int(self.sample_size / 2),
replace=False)
src = src[chosen_ids]
dst = dst[chosen_ids]
rel = rel[chosen_ids]
src, dst = np.concatenate((src, dst)), np.concatenate((dst, src))
rel = np.concatenate((rel, rel + self.num_rels))
sub_g = dgl.graph((src, dst), num_nodes=num_nodes)
sub_g.edata[dgl.ETYPE] = th.from_numpy(rel)
sub_g.edata['norm'] = dgl.norm_by_dst(sub_g).unsqueeze(-1)
uniq_v = th.from_numpy(uniq_v).view(-1, 1).long()
return sub_g, uniq_v, samples, labels
# Utility functions for evaluations (raw)
def perturb_and_get_raw_rank(emb, w, a, r, b, test_size, batch_size=100):
""" Perturb one element in the triplets"""
n_batch = (test_size + batch_size - 1) // batch_size
ranks = []
emb = emb.transpose(0, 1) # size D x V
w = w.transpose(0, 1) # size D x R
for idx in range(n_batch):
print("batch {} / {}".format(idx, n_batch))
batch_start = idx * batch_size
batch_end = (idx + 1) * batch_size
batch_a = a[batch_start: batch_end]
batch_r = r[batch_start: batch_end]
emb_ar = emb[:,batch_a] * w[:,batch_r] # size D x E
emb_ar = emb_ar.unsqueeze(2) # size D x E x 1
emb_c = emb.unsqueeze(1) # size D x 1 x V
# out-prod and reduce sum
out_prod = th.bmm(emb_ar, emb_c) # size D x E x V
score = th.sum(out_prod, dim=0).sigmoid() # size E x V
target = b[batch_start: batch_end]
_, indices = th.sort(score, dim=1, descending=True)
indices = th.nonzero(indices == target.view(-1, 1), as_tuple=False)
ranks.append(indices[:, 1].view(-1))
return th.cat(ranks)
# Utility functions for evaluations (filtered)
def filter(triplets_to_filter, target_s, target_r, target_o, num_nodes, filter_o=True):
"""Get candidate heads or tails to score"""
target_s, target_r, target_o = int(target_s), int(target_r), int(target_o)
# Add the ground truth node first
if filter_o:
candidate_nodes = [target_o]
else:
candidate_nodes = [target_s]
for e in range(num_nodes):
triplet = (target_s, target_r, e) if filter_o else (e, target_r, target_o)
# Do not consider a node if it leads to a real triplet
if triplet not in triplets_to_filter:
candidate_nodes.append(e)
return th.LongTensor(candidate_nodes)
def perturb_and_get_filtered_rank(emb, w, s, r, o, test_size, triplets_to_filter, filter_o=True):
"""Perturb subject or object in the triplets"""
num_nodes = emb.shape[0]
ranks = []
for idx in range(test_size):
if idx % 100 == 0:
print("test triplet {} / {}".format(idx, test_size))
target_s = s[idx]
target_r = r[idx]
target_o = o[idx]
candidate_nodes = filter(triplets_to_filter, target_s, target_r,
target_o, num_nodes, filter_o=filter_o)
if filter_o:
emb_s = emb[target_s]
emb_o = emb[candidate_nodes]
else:
emb_s = emb[candidate_nodes]
emb_o = emb[target_o]
target_idx = 0
emb_r = w[target_r]
emb_triplet = emb_s * emb_r * emb_o
scores = th.sigmoid(th.sum(emb_triplet, dim=1))
_, indices = th.sort(scores, descending=True)
rank = int((indices == target_idx).nonzero())
ranks.append(rank)
return th.LongTensor(ranks)
def _calc_mrr(emb, w, test_mask, triplets_to_filter, batch_size, filter=False):
with th.no_grad():
test_triplets = triplets_to_filter[test_mask]
s, r, o = test_triplets[:,0], test_triplets[:,1], test_triplets[:,2]
test_size = len(s)
if filter:
metric_name = 'MRR (filtered)'
triplets_to_filter = {tuple(triplet) for triplet in triplets_to_filter.tolist()}
ranks_s = perturb_and_get_filtered_rank(emb, w, s, r, o, test_size,
triplets_to_filter, filter_o=False)
ranks_o = perturb_and_get_filtered_rank(emb, w, s, r, o,
test_size, triplets_to_filter)
else:
metric_name = 'MRR (raw)'
ranks_s = perturb_and_get_raw_rank(emb, w, o, r, s, test_size, batch_size)
ranks_o = perturb_and_get_raw_rank(emb, w, s, r, o, test_size, batch_size)
ranks = th.cat([ranks_s, ranks_o])
ranks += 1 # change to 1-indexed
mrr = th.mean(1.0 / ranks.float()).item()
print("{}: {:.6f}".format(metric_name, mrr))
return mrr
# Main evaluation function
def calc_mrr(emb, w, test_mask, triplets, batch_size=100, eval_p="filtered"):
if eval_p == "filtered":
mrr = _calc_mrr(emb, w, test_mask, triplets, batch_size, filter=True)
else:
mrr = _calc_mrr(emb, w, test_mask, triplets, batch_size)
return mrr
from dgl import DGLGraph
import torch as th import torch as th
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F
import dgl import dgl
class BaseRGCN(nn.Module): from dgl.nn.pytorch import RelGraphConv
def __init__(self, num_nodes, h_dim, out_dim, num_rels, num_bases,
num_hidden_layers=1, dropout=0,
use_self_loop=False, use_cuda=False):
super(BaseRGCN, self).__init__()
self.num_nodes = num_nodes
self.h_dim = h_dim
self.out_dim = out_dim
self.num_rels = num_rels
self.num_bases = None if num_bases < 0 else num_bases
self.num_hidden_layers = num_hidden_layers
self.dropout = dropout
self.use_self_loop = use_self_loop
self.use_cuda = use_cuda
# create rgcn layers class RGCN(nn.Module):
self.build_model() def __init__(self, in_dim, h_dim, out_dim, num_rels,
regularizer="basis", num_bases=-1, dropout=0.,
self_loop=False, link_pred=False):
super(RGCN, self).__init__()
def build_model(self):
self.layers = nn.ModuleList() self.layers = nn.ModuleList()
# i2h if link_pred:
i2h = self.build_input_layer() self.emb = nn.Embedding(in_dim, h_dim)
if i2h is not None: in_dim = h_dim
self.layers.append(i2h) else:
# h2h self.emb = None
for idx in range(self.num_hidden_layers): self.layers.append(RelGraphConv(in_dim, h_dim, num_rels, regularizer,
h2h = self.build_hidden_layer(idx) num_bases, activation=F.relu, self_loop=self_loop,
self.layers.append(h2h) dropout=dropout))
# h2o
h2o = self.build_output_layer() # For entity classification, dropout should not be applied to the output layer
if h2o is not None: if not link_pred:
self.layers.append(h2o) dropout = 0.
self.layers.append(RelGraphConv(h_dim, out_dim, num_rels, regularizer,
def build_input_layer(self): num_bases, self_loop=self_loop, dropout=dropout))
return None
def forward(self, g, h):
def build_hidden_layer(self, idx): if isinstance(g, DGLGraph):
raise NotImplementedError blocks = [g] * len(self.layers)
else:
blocks = g
def build_output_layer(self): if self.emb is not None:
return None h = self.emb(h.squeeze())
def forward(self, g, h, r, norm): for layer, block in zip(self.layers, blocks):
for layer in self.layers: h = layer(block, h, block.edata[dgl.ETYPE], block.edata['norm'])
h = layer(g, h, r, norm)
return h return h
def initializer(emb): def initializer(emb):
...@@ -55,112 +46,42 @@ def initializer(emb): ...@@ -55,112 +46,42 @@ def initializer(emb):
return emb return emb
class RelGraphEmbedLayer(nn.Module): class RelGraphEmbedLayer(nn.Module):
r"""Embedding layer for featureless heterograph. """Embedding layer for featureless heterograph.
Parameters Parameters
---------- ----------
storage_dev_id : int out_dev
The device to store the weights of the layer. Device to store the output embeddings
out_dev_id : int
Device to return the output embeddings on.
num_nodes : int num_nodes : int
Number of nodes. Number of nodes in the graph.
node_tides : tensor
Storing the node type id for each node starting from 0
num_of_ntype : int
Number of node types
input_size : list of int
A list of input feature size for each node type. If None, we then
treat certain input feature as an one-hot encoding feature.
embed_size : int embed_size : int
Output embed size Output embed size
dgl_sparse : bool, optional
If true, use dgl.nn.NodeEmbedding otherwise use torch.nn.Embedding
""" """
def __init__(self, def __init__(self,
storage_dev_id, out_dev,
out_dev_id,
num_nodes, num_nodes,
node_tids, embed_size):
num_of_ntype,
input_size,
embed_size,
dgl_sparse=False):
super(RelGraphEmbedLayer, self).__init__() super(RelGraphEmbedLayer, self).__init__()
self.storage_dev_id = th.device( \ self.out_dev = out_dev
storage_dev_id if storage_dev_id >= 0 else 'cpu')
self.out_dev_id = th.device(out_dev_id if out_dev_id >= 0 else 'cpu')
self.embed_size = embed_size self.embed_size = embed_size
self.num_nodes = num_nodes
self.dgl_sparse = dgl_sparse
# create weight embeddings for each node for each relation
self.embeds = nn.ParameterDict()
self.node_embeds = {} if dgl_sparse else nn.ModuleDict()
self.num_of_ntype = num_of_ntype
for ntype in range(num_of_ntype):
if isinstance(input_size[ntype], int):
if dgl_sparse:
self.node_embeds[str(ntype)] = dgl.nn.NodeEmbedding(input_size[ntype], embed_size, name=str(ntype),
init_func=initializer, device=self.storage_dev_id)
else:
sparse_emb = th.nn.Embedding(input_size[ntype], embed_size, sparse=True)
sparse_emb.cuda(self.storage_dev_id)
nn.init.uniform_(sparse_emb.weight, -1.0, 1.0)
self.node_embeds[str(ntype)] = sparse_emb
else:
input_emb_size = input_size[ntype].shape[1]
embed = nn.Parameter(th.empty([input_emb_size, self.embed_size],
device=self.storage_dev_id))
nn.init.xavier_uniform_(embed)
self.embeds[str(ntype)] = embed
@property # create embeddings for all nodes
def dgl_emb(self): self.node_embed = nn.Embedding(num_nodes, embed_size, sparse=True)
""" nn.init.uniform_(self.node_embed.weight, -1.0, 1.0)
"""
if self.dgl_sparse:
embs = [emb for emb in self.node_embeds.values()]
return embs
else:
return []
def forward(self, node_ids, node_tids, type_ids, features): def forward(self, node_ids):
"""Forward computation """Forward computation
Parameters Parameters
---------- ----------
node_ids : tensor node_ids : tensor
node ids to generate embedding for. Raw node IDs.
node_ids : tensor
node type ids
features : list of features
list of initial features for nodes belong to different node type.
If None, the corresponding features is an one-hot encoding feature,
else use the features directly as input feature and matmul a
projection matrix.
Returns Returns
------- -------
tensor tensor
embeddings as the input of the next layer embeddings as the input of the next layer
""" """
embeds = th.empty(node_ids.shape[0], self.embed_size, device=self.out_dev_id) embeds = self.node_embed(node_ids).to(self.out_dev)
# transfer input to the correct device
type_ids = type_ids.to(self.storage_dev_id)
node_tids = node_tids.to(self.storage_dev_id)
# build locs first
locs = [None for i in range(self.num_of_ntype)]
for ntype in range(self.num_of_ntype):
locs[ntype] = (node_tids == ntype).nonzero().squeeze(-1)
for ntype in range(self.num_of_ntype):
loc = locs[ntype]
if isinstance(features[ntype], int):
if self.dgl_sparse:
embeds[loc] = self.node_embeds[str(ntype)](type_ids[loc], self.out_dev_id)
else:
embeds[loc] = self.node_embeds[str(ntype)](type_ids[loc]).to(self.out_dev_id)
else:
embeds[loc] = features[ntype][type_ids[loc]].to(self.out_dev_id) @ self.embeds[str(ntype)].to(self.out_dev_id)
return embeds return embeds
"""
Utility functions for link prediction
Most code is adapted from authors' implementation of RGCN link prediction:
https://github.com/MichSchli/RelationPrediction
"""
import numpy as np
import torch
from torch.multiprocessing import Queue
import dgl
#######################################################################
#
# Utility function for building training and testing graphs
#
#######################################################################
def get_adj_and_degrees(num_nodes, triplets):
""" Get adjacency list and degrees of the graph
"""
adj_list = [[] for _ in range(num_nodes)]
for i,triplet in enumerate(triplets):
adj_list[triplet[0]].append([i, triplet[2]])
adj_list[triplet[2]].append([i, triplet[0]])
degrees = np.array([len(a) for a in adj_list])
adj_list = [np.array(a) for a in adj_list]
return adj_list, degrees
def sample_edge_neighborhood(adj_list, degrees, n_triplets, sample_size):
"""Sample edges by neighborhool expansion.
This guarantees that the sampled edges form a connected graph, which
may help deeper GNNs that require information from more than one hop.
"""
edges = np.zeros((sample_size), dtype=np.int32)
#initialize
sample_counts = np.array([d for d in degrees])
picked = np.array([False for _ in range(n_triplets)])
seen = np.array([False for _ in degrees])
for i in range(0, sample_size):
weights = sample_counts * seen
if np.sum(weights) == 0:
weights = np.ones_like(weights)
weights[np.where(sample_counts == 0)] = 0
probabilities = (weights) / np.sum(weights)
chosen_vertex = np.random.choice(np.arange(degrees.shape[0]),
p=probabilities)
chosen_adj_list = adj_list[chosen_vertex]
seen[chosen_vertex] = True
chosen_edge = np.random.choice(np.arange(chosen_adj_list.shape[0]))
chosen_edge = chosen_adj_list[chosen_edge]
edge_number = chosen_edge[0]
while picked[edge_number]:
chosen_edge = np.random.choice(np.arange(chosen_adj_list.shape[0]))
chosen_edge = chosen_adj_list[chosen_edge]
edge_number = chosen_edge[0]
edges[i] = edge_number
other_vertex = chosen_edge[1]
picked[edge_number] = True
sample_counts[chosen_vertex] -= 1
sample_counts[other_vertex] -= 1
seen[other_vertex] = True
return edges
def sample_edge_uniform(adj_list, degrees, n_triplets, sample_size):
"""Sample edges uniformly from all the edges."""
all_edges = np.arange(n_triplets)
return np.random.choice(all_edges, sample_size, replace=False)
def generate_sampled_graph_and_labels(triplets, sample_size, split_size,
num_rels, adj_list, degrees,
negative_rate, sampler="uniform"):
"""Get training graph and signals
First perform edge neighborhood sampling on graph, then perform negative
sampling to generate negative samples
"""
# perform edge neighbor sampling
if sampler == "uniform":
edges = sample_edge_uniform(adj_list, degrees, len(triplets), sample_size)
elif sampler == "neighbor":
edges = sample_edge_neighborhood(adj_list, degrees, len(triplets), sample_size)
else:
raise ValueError("Sampler type must be either 'uniform' or 'neighbor'.")
# relabel nodes to have consecutive node ids
edges = triplets[edges]
src, rel, dst = edges.transpose()
uniq_v, edges = np.unique((src, dst), return_inverse=True)
src, dst = np.reshape(edges, (2, -1))
relabeled_edges = np.stack((src, rel, dst)).transpose()
# negative sampling
samples, labels = negative_sampling(relabeled_edges, len(uniq_v),
negative_rate)
# further split graph, only half of the edges will be used as graph
# structure, while the rest half is used as unseen positive samples
split_size = int(sample_size * split_size)
graph_split_ids = np.random.choice(np.arange(sample_size),
size=split_size, replace=False)
src = src[graph_split_ids]
dst = dst[graph_split_ids]
rel = rel[graph_split_ids]
# build DGL graph
print("# sampled nodes: {}".format(len(uniq_v)))
print("# sampled edges: {}".format(len(src) * 2))
g, rel, norm = build_graph_from_triplets(len(uniq_v), num_rels,
(src, rel, dst))
return g, uniq_v, rel, norm, samples, labels
def comp_deg_norm(g):
g = g.local_var()
in_deg = g.in_degrees(range(g.number_of_nodes())).float().numpy()
norm = 1.0 / in_deg
norm[np.isinf(norm)] = 0
return norm
def build_graph_from_triplets(num_nodes, num_rels, triplets):
""" Create a DGL graph. The graph is bidirectional because RGCN authors
use reversed relations.
This function also generates edge type and normalization factor
(reciprocal of node incoming degree)
"""
g = dgl.graph(([], []))
g.add_nodes(num_nodes)
src, rel, dst = triplets
src, dst = np.concatenate((src, dst)), np.concatenate((dst, src))
rel = np.concatenate((rel, rel + num_rels))
edges = sorted(zip(dst, src, rel))
dst, src, rel = np.array(edges).transpose()
g.add_edges(src, dst)
norm = comp_deg_norm(g)
print("# nodes: {}, # edges: {}".format(num_nodes, len(src)))
return g, rel.astype('int64'), norm.astype('int64')
def build_test_graph(num_nodes, num_rels, edges):
src, rel, dst = edges.transpose()
print("Test graph:")
return build_graph_from_triplets(num_nodes, num_rels, (src, rel, dst))
def negative_sampling(pos_samples, num_entity, negative_rate):
size_of_batch = len(pos_samples)
num_to_generate = size_of_batch * negative_rate
neg_samples = np.tile(pos_samples, (negative_rate, 1))
labels = np.zeros(size_of_batch * (negative_rate + 1), dtype=np.float32)
labels[: size_of_batch] = 1
values = np.random.randint(num_entity, size=num_to_generate)
choices = np.random.uniform(size=num_to_generate)
subj = choices > 0.5
obj = choices <= 0.5
neg_samples[subj, 0] = values[subj]
neg_samples[obj, 2] = values[obj]
return np.concatenate((pos_samples, neg_samples)), labels
#######################################################################
#
# Utility functions for evaluations (raw)
#
#######################################################################
def sort_and_rank(score, target):
_, indices = torch.sort(score, dim=1, descending=True)
indices = torch.nonzero(indices == target.view(-1, 1), as_tuple=False)
indices = indices[:, 1].view(-1)
return indices
def perturb_and_get_raw_rank(embedding, w, a, r, b, test_size, batch_size=100):
""" Perturb one element in the triplets
"""
n_batch = (test_size + batch_size - 1) // batch_size
ranks = []
for idx in range(n_batch):
print("batch {} / {}".format(idx, n_batch))
batch_start = idx * batch_size
batch_end = min(test_size, (idx + 1) * batch_size)
batch_a = a[batch_start: batch_end]
batch_r = r[batch_start: batch_end]
emb_ar = embedding[batch_a] * w[batch_r]
emb_ar = emb_ar.transpose(0, 1).unsqueeze(2) # size: D x E x 1
emb_c = embedding.transpose(0, 1).unsqueeze(1) # size: D x 1 x V
# out-prod and reduce sum
out_prod = torch.bmm(emb_ar, emb_c) # size D x E x V
score = torch.sum(out_prod, dim=0) # size E x V
score = torch.sigmoid(score)
target = b[batch_start: batch_end]
ranks.append(sort_and_rank(score, target))
return torch.cat(ranks)
# return MRR (raw), and Hits @ (1, 3, 10)
def calc_raw_mrr(embedding, w, test_triplets, hits=[], eval_bz=100):
with torch.no_grad():
s = test_triplets[:, 0]
r = test_triplets[:, 1]
o = test_triplets[:, 2]
test_size = test_triplets.shape[0]
# perturb subject
ranks_s = perturb_and_get_raw_rank(embedding, w, o, r, s, test_size, eval_bz)
# perturb object
ranks_o = perturb_and_get_raw_rank(embedding, w, s, r, o, test_size, eval_bz)
ranks = torch.cat([ranks_s, ranks_o])
ranks += 1 # change to 1-indexed
mrr = torch.mean(1.0 / ranks.float())
print("MRR (raw): {:.6f}".format(mrr.item()))
for hit in hits:
avg_count = torch.mean((ranks <= hit).float())
print("Hits (raw) @ {}: {:.6f}".format(hit, avg_count.item()))
return mrr.item()
#######################################################################
#
# Utility functions for evaluations (filtered)
#
#######################################################################
def filter_o(triplets_to_filter, target_s, target_r, target_o, num_entities):
target_s, target_r, target_o = int(target_s), int(target_r), int(target_o)
filtered_o = []
# Do not filter out the test triplet, since we want to predict on it
if (target_s, target_r, target_o) in triplets_to_filter:
triplets_to_filter.remove((target_s, target_r, target_o))
# Do not consider an object if it is part of a triplet to filter
for o in range(num_entities):
if (target_s, target_r, o) not in triplets_to_filter:
filtered_o.append(o)
return torch.LongTensor(filtered_o)
def filter_s(triplets_to_filter, target_s, target_r, target_o, num_entities):
target_s, target_r, target_o = int(target_s), int(target_r), int(target_o)
filtered_s = []
# Do not filter out the test triplet, since we want to predict on it
if (target_s, target_r, target_o) in triplets_to_filter:
triplets_to_filter.remove((target_s, target_r, target_o))
# Do not consider a subject if it is part of a triplet to filter
for s in range(num_entities):
if (s, target_r, target_o) not in triplets_to_filter:
filtered_s.append(s)
return torch.LongTensor(filtered_s)
def perturb_o_and_get_filtered_rank(embedding, w, s, r, o, test_size, triplets_to_filter):
""" Perturb object in the triplets
"""
num_entities = embedding.shape[0]
ranks = []
for idx in range(test_size):
if idx % 100 == 0:
print("test triplet {} / {}".format(idx, test_size))
target_s = s[idx]
target_r = r[idx]
target_o = o[idx]
filtered_o = filter_o(triplets_to_filter, target_s, target_r, target_o, num_entities)
target_o_idx = int((filtered_o == target_o).nonzero())
emb_s = embedding[target_s]
emb_r = w[target_r]
emb_o = embedding[filtered_o]
emb_triplet = emb_s * emb_r * emb_o
scores = torch.sigmoid(torch.sum(emb_triplet, dim=1))
_, indices = torch.sort(scores, descending=True)
rank = int((indices == target_o_idx).nonzero())
ranks.append(rank)
return torch.LongTensor(ranks)
def perturb_s_and_get_filtered_rank(embedding, w, s, r, o, test_size, triplets_to_filter):
""" Perturb subject in the triplets
"""
num_entities = embedding.shape[0]
ranks = []
for idx in range(test_size):
if idx % 100 == 0:
print("test triplet {} / {}".format(idx, test_size))
target_s = s[idx]
target_r = r[idx]
target_o = o[idx]
filtered_s = filter_s(triplets_to_filter, target_s, target_r, target_o, num_entities)
target_s_idx = int((filtered_s == target_s).nonzero())
emb_s = embedding[filtered_s]
emb_r = w[target_r]
emb_o = embedding[target_o]
emb_triplet = emb_s * emb_r * emb_o
scores = torch.sigmoid(torch.sum(emb_triplet, dim=1))
_, indices = torch.sort(scores, descending=True)
rank = int((indices == target_s_idx).nonzero())
ranks.append(rank)
return torch.LongTensor(ranks)
def calc_filtered_mrr(embedding, w, train_triplets, valid_triplets, test_triplets, hits=[]):
with torch.no_grad():
s = test_triplets[:, 0]
r = test_triplets[:, 1]
o = test_triplets[:, 2]
test_size = test_triplets.shape[0]
triplets_to_filter = torch.cat([train_triplets, valid_triplets, test_triplets]).tolist()
triplets_to_filter = {tuple(triplet) for triplet in triplets_to_filter}
print('Perturbing subject...')
ranks_s = perturb_s_and_get_filtered_rank(embedding, w, s, r, o, test_size, triplets_to_filter)
print('Perturbing object...')
ranks_o = perturb_o_and_get_filtered_rank(embedding, w, s, r, o, test_size, triplets_to_filter)
ranks = torch.cat([ranks_s, ranks_o])
ranks += 1 # change to 1-indexed
mrr = torch.mean(1.0 / ranks.float())
print("MRR (filtered): {:.6f}".format(mrr.item()))
for hit in hits:
avg_count = torch.mean((ranks <= hit).float())
print("Hits (filtered) @ {}: {:.6f}".format(hit, avg_count.item()))
return mrr.item()
#######################################################################
#
# Main evaluation function
#
#######################################################################
def calc_mrr(embedding, w, train_triplets, valid_triplets, test_triplets, hits=[], eval_bz=100, eval_p="filtered"):
if eval_p == "filtered":
mrr = calc_filtered_mrr(embedding, w, train_triplets, valid_triplets, test_triplets, hits)
else:
mrr = calc_raw_mrr(embedding, w, test_triplets, hits, eval_bz)
return mrr
...@@ -1178,7 +1178,7 @@ def count_nonzero(input): ...@@ -1178,7 +1178,7 @@ def count_nonzero(input):
# DGL should contain all the operations on index, so this set of operators # DGL should contain all the operations on index, so this set of operators
# should be gradually removed. # should be gradually removed.
def unique(input, return_inverse=False): def unique(input, return_inverse=False, return_counts=False):
"""Returns the unique scalar elements in a tensor. """Returns the unique scalar elements in a tensor.
Parameters Parameters
...@@ -1188,13 +1188,19 @@ def unique(input, return_inverse=False): ...@@ -1188,13 +1188,19 @@ def unique(input, return_inverse=False):
return_inverse : bool, optional return_inverse : bool, optional
Whether to also return the indices for where elements in the original Whether to also return the indices for where elements in the original
input ended up in the returned unique list. input ended up in the returned unique list.
return_counts : bool, optional
Whether to also return the counts for each unique element.
Returns Returns
------- -------
Tensor Tensor
A 1-D tensor containing unique elements. A 1-D tensor containing unique elements.
Tensor Tensor, optional
A 1-D tensor containing the new positions of the elements in the input. A 1-D tensor containing the new positions of the elements in the input.
It is returned if return_inverse is True.
Tensor, optional
A 1-D tensor containing the number of occurrences for each unique value or tensor.
It is returned if return_counts is True.
""" """
pass pass
......
...@@ -353,14 +353,20 @@ def count_nonzero(input): ...@@ -353,14 +353,20 @@ def count_nonzero(input):
tmp = input.asnumpy() tmp = input.asnumpy()
return np.count_nonzero(tmp) return np.count_nonzero(tmp)
def unique(input, return_inverse=False): def unique(input, return_inverse=False, return_counts=False):
# TODO: fallback to numpy is unfortunate # TODO: fallback to numpy is unfortunate
tmp = input.asnumpy() tmp = input.asnumpy()
if return_inverse: if return_inverse and return_counts:
tmp, inv = np.unique(tmp, return_inverse=True) tmp, inv, count = np.unique(tmp, return_inverse=True, return_counts=True)
tmp = nd.array(tmp, ctx=input.context, dtype=input.dtype) tmp = nd.array(tmp, ctx=input.context, dtype=input.dtype)
inv = nd.array(inv, ctx=input.context) inv = nd.array(inv, ctx=input.context)
return tmp, inv count = nd.array(count, ctx=input.context)
return tmp, inv, count
elif return_inverse or return_counts:
tmp, tmp2 = np.unique(tmp, return_inverse=return_inverse, return_counts=return_counts)
tmp = nd.array(tmp, ctx=input.context, dtype=input.dtype)
tmp2 = nd.array(tmp2, ctx=input.context)
return tmp, tmp2
else: else:
tmp = np.unique(tmp) tmp = np.unique(tmp)
return nd.array(tmp, ctx=input.context, dtype=input.dtype) return nd.array(tmp, ctx=input.context, dtype=input.dtype)
......
...@@ -301,10 +301,10 @@ def count_nonzero(input): ...@@ -301,10 +301,10 @@ def count_nonzero(input):
# TODO: fallback to numpy for backward compatibility # TODO: fallback to numpy for backward compatibility
return np.count_nonzero(input) return np.count_nonzero(input)
def unique(input, return_inverse=False): def unique(input, return_inverse=False, return_counts=False):
if input.dtype == th.bool: if input.dtype == th.bool:
input = input.type(th.int8) input = input.type(th.int8)
return th.unique(input, return_inverse=return_inverse) return th.unique(input, return_inverse=return_inverse, return_counts=return_counts)
def full_1d(length, fill_value, dtype, ctx): def full_1d(length, fill_value, dtype, ctx):
return th.full((length,), fill_value, dtype=dtype, device=ctx) return th.full((length,), fill_value, dtype=dtype, device=ctx)
......
...@@ -425,8 +425,13 @@ def count_nonzero(input): ...@@ -425,8 +425,13 @@ def count_nonzero(input):
return int(tf.math.count_nonzero(input)) return int(tf.math.count_nonzero(input))
def unique(input, return_inverse=False): def unique(input, return_inverse=False, return_counts=False):
if return_inverse: if return_inverse and return_counts:
return tf.unique_with_counts(input)
elif return_counts:
result = tf.unique_with_counts(input)
return result.y, result.count
elif return_inverse:
return tf.unique(input) return tf.unique(input)
else: else:
return tf.unique(input).y return tf.unique(input).y
......
...@@ -352,7 +352,7 @@ class FB15k237Dataset(KnowledgeGraphDataset): ...@@ -352,7 +352,7 @@ class FB15k237Dataset(KnowledgeGraphDataset):
>>> graph = dataset[0] >>> graph = dataset[0]
>>> train_mask = graph.edata['train_mask'] >>> train_mask = graph.edata['train_mask']
>>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze() >>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(train_idx) >>> src, dst = graph.find_edges(train_idx)
>>> rel = graph.edata['etype'][train_idx] >>> rel = graph.edata['etype'][train_idx]
- ``valid`` is deprecated, it is replaced by: - ``valid`` is deprecated, it is replaced by:
...@@ -361,7 +361,7 @@ class FB15k237Dataset(KnowledgeGraphDataset): ...@@ -361,7 +361,7 @@ class FB15k237Dataset(KnowledgeGraphDataset):
>>> graph = dataset[0] >>> graph = dataset[0]
>>> val_mask = graph.edata['val_mask'] >>> val_mask = graph.edata['val_mask']
>>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze() >>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(val_idx) >>> src, dst = graph.find_edges(val_idx)
>>> rel = graph.edata['etype'][val_idx] >>> rel = graph.edata['etype'][val_idx]
- ``test`` is deprecated, it is replaced by: - ``test`` is deprecated, it is replaced by:
...@@ -370,7 +370,7 @@ class FB15k237Dataset(KnowledgeGraphDataset): ...@@ -370,7 +370,7 @@ class FB15k237Dataset(KnowledgeGraphDataset):
>>> graph = dataset[0] >>> graph = dataset[0]
>>> test_mask = graph.edata['test_mask'] >>> test_mask = graph.edata['test_mask']
>>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze() >>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(test_idx) >>> src, dst = graph.find_edges(test_idx)
>>> rel = graph.edata['etype'][test_idx] >>> rel = graph.edata['etype'][test_idx]
FB15k-237 is a subset of FB15k where inverse FB15k-237 is a subset of FB15k where inverse
......
...@@ -69,7 +69,8 @@ __all__ = [ ...@@ -69,7 +69,8 @@ __all__ = [
'as_heterograph', 'as_heterograph',
'adj_product_graph', 'adj_product_graph',
'adj_sum_graph', 'adj_sum_graph',
'reorder_graph' 'reorder_graph',
'norm_by_dst'
] ]
...@@ -3213,4 +3214,42 @@ def rcmk_perm(g): ...@@ -3213,4 +3214,42 @@ def rcmk_perm(g):
perm = sparse.csgraph.reverse_cuthill_mckee(csr_adj) perm = sparse.csgraph.reverse_cuthill_mckee(csr_adj)
return perm.copy() return perm.copy()
def norm_by_dst(g, etype=None):
r"""Calculate normalization coefficient per edge based on destination node degree.
Parameters
----------
g : DGLGraph
The input graph.
etype : str or (str, str, str), optional
The type of the edges to calculate. The allowed edge type formats are:
* ``(str, str, str)`` for source node type, edge type and destination node type.
* or one ``str`` edge type name if the name can uniquely identify a
triplet format in the graph.
It can be omitted if the graph has a single edge type.
Returns
-------
1D Tensor
The normalization coefficient of the edges.
Examples
--------
>>> import dgl
>>> g = dgl.graph(([0, 1, 1], [1, 1, 2]))
>>> print(dgl.norm_by_dst(g))
tensor([0.5000, 0.5000, 1.0000])
"""
_, v, _ = g.edges(form='all', etype=etype)
_, inv_index, count = F.unique(v, return_inverse=True, return_counts=True)
deg = F.astype(count[inv_index], F.float32)
norm = 1. / deg
norm = F.replace_inf_with_zero(norm)
return norm
_init_api("dgl.transform", __name__) _init_api("dgl.transform", __name__)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment