[NN]Supporting TransR in app/kg score_func (#945)

* Add TransR for kge * Now Pytorch TransR can run * Add MXNet TransR * Now mxnet can work with small dim size * Add test * Pass simple test_score * Update test with transR score func * Update RESCAL MXNet * Add missing funcs * Update init func for transR score * Revert "Update init func for transR score" This reverts commit 0798bb886095e7581f6675da5343376844ce45b9. * Update score func of TransR MXNet Make it more memory friendly and faster, thourgh it is still very slow and memory consuming * Update best config * Fix ramdom seed for test * Init score-func specific var * Update Readme

[NN]Supporting TransR in app/kg score_func (#945)
* Add TransR for kge * Now Pytorch TransR can run * Add MXNet TransR * Now mxnet can work with small dim size * Add test * Pass simple test_score * Update test with transR score func * Update RESCAL MXNet * Add missing funcs * Update init func for transR score * Revert "Update init func for transR score" This reverts commit 0798bb886095e7581f6675da5343376844ce45b9. * Update score func of TransR MXNet Make it more memory friendly and faster, thourgh it is still very slow and memory consuming * Update best config * Fix ramdom seed for test * Init score-func specific var * Update Readme
7f65199a · xiang song(charlie.song) · Da Zheng · a85382b0 · 7f65199a · 7f65199a
Commit 7f65199a authored Nov 02, 2019 by xiang song(charlie.song) Committed by Da Zheng Nov 01, 2019
6 changed files
--- a/apps/kg/README.md
+++ b/apps/kg/README.md
@@ -19,6 +19,7 @@ DGL-KE includes the following knowledge graph embedding models:
 - DistMult
 - ComplEx
 - RESCAL
+- TransR
 It will add other popular models in the future.
@@ -60,10 +61,10 @@ The speed is measured with 16 CPU cores and one Nvidia V100 GPU.
 The speed on FB15k
-|  Models | TransE | DistMult | ComplEx | RESCAL |
+|  Models | TransE | DistMult | ComplEx | RESCAL | TransR |
-|---------|--------|----------|---------|--------|
+|---------|--------|----------|---------|--------|--------|
-|MAX_STEPS| 20000  | 100000   | 100000  | 30000  |
+|MAX_STEPS| 20000  | 100000   | 100000  | 30000  | 100000 |
-|TIME     | 411s   | 690s     | 806s    | 1800s  |
+|TIME     | 411s   | 690s     | 806s    | 1800s  | 7627s  |
 The accuracy on FB15k
@@ -73,15 +74,16 @@ The accuracy on FB15k
 | DistMult | 43.35 | 0.783 | 0.713  | 0.837  | 0.897   |
 | ComplEx  | 51.99 | 0.785 | 0.720  | 0.832  | 0.889   |
 | RESCAL   | 130.89| 0.668 | 0.597  | 0.720  | 0.800   |
+| TransR   | 138.7 | 0.501 | 0.274  | 0.704  | 0.801   |
 In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k twice as fast as GraphVite while using much few resources. More performance information on GraphVite can be found [here](https://github.com/DeepGraphLearning/graphvite).
 The speed on wn18
-|  Models | TransE | DistMult | ComplEx | RESCAL |
+|  Models | TransE | DistMult | ComplEx | RESCAL | TransR |
-|---------|--------|----------|---------|--------|
+|---------|--------|----------|---------|--------|--------|
-|MAX_STEPS| 40000  | 10000    | 20000   | 20000  |
+|MAX_STEPS| 40000  | 10000    | 20000   | 20000  | 20000  |
-|TIME     | 719s   | 126s     | 266s    | 333s   |
+|TIME     | 719s   | 126s     | 266s    | 333s   | 1547s  |
 The accuracy on wn18
@@ -91,6 +93,7 @@ The accuracy on wn18
 | DistMult | 271.09 | 0.769 | 0.639  | 0.892  | 0.949   |
 | ComplEx  | 276.37 | 0.935 | 0.916  | 0.950  | 0.960   |
 | RESCAL   | 579.54 | 0.846 | 0.791  | 0.898  | 0.931   |
+| TransR   | 615.56 | 0.606 | 0.378  | 0.826  | 0.890   |
 The speed on Freebase

--- a/apps/kg/config/best_config.sh
+++ b/apps/kg/config/best_config.sh
@@ -18,6 +18,10 @@ DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset FB15k --batch_size
    --neg_sample_size 256 --hidden_dim 500 --gamma 24.0 --lr 0.03 --max_step 30000 \
    --batch_size_eval 16 --gpu 0 --valid --test -adv
+DGLBACKEND=pytorch python3 train.py --model TransR --dataset FB15k --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 500 --gamma 24.0 --lr 0.01 --max_step 30000 \
+    --batch_size_eval 16 --gpu 0 --valid --test -adv
 # for wn18
 DGLBACKEND=pytorch python3 train.py --model TransE --dataset wn18 --batch_size 1024 \
@@ -37,6 +41,10 @@ DGLBACKEND=pytorch python3 train.py --model RESCAL --dataset wn18 --batch_size 1
    --neg_sample_size 256 --hidden_dim 250 --gamma 24.0 --lr 0.03 --max_step 20000 \
    --batch_size_eval 16 --gpu 0 --valid --test -adv
+DGLBACKEND=pytorch python3 train.py --model TransR --dataset wn18 --batch_size 1024 \
+    --neg_sample_size 256 --hidden_dim 500 --gamma 16.0 --lr 0.1 --max_step 30000 \
+    --batch_size_eval 16 --gpu 0 --valid --test -adv
 # for Freebase
 DGLBACKEND=pytorch python3 train.py --model ComplEx --dataset Freebase --batch_size 1024 \

--- a/apps/kg/models/general_models.py
+++ b/apps/kg/models/general_models.py
@@ -48,6 +48,10 @@ class KEModel(object):
        if model_name == 'TransE':
            self.score_func = TransEScore(gamma)
+        elif model_name == 'TransR':
+            projection_emb = ExternalEmbedding(args, n_relations, entity_dim * relation_dim,
+                                               F.cpu() if args.mix_cpu_gpu else device)
+            self.score_func = TransRScore(gamma, projection_emb, relation_dim, entity_dim)
        elif model_name == 'DistMult':
            self.score_func = DistMultScore()
        elif model_name == 'ComplEx':
@@ -57,6 +61,8 @@ class KEModel(object):
        self.head_neg_score = self.score_func.create_neg(True)
        self.tail_neg_score = self.score_func.create_neg(False)
+        self.head_neg_prepare = self.score_func.create_neg_prepare(True)
+        self.tail_neg_prepare = self.score_func.create_neg_prepare(False)
        self.reset_parameters()
@@ -68,12 +74,12 @@ class KEModel(object):
    def save_emb(self, path, dataset):
        self.entity_emb.save(path, dataset+'_'+self.model_name+'_entity')
        self.relation_emb.save(path, dataset+'_'+self.model_name+'_relation')
-        self.score_func.save(path, dataset)
+        self.score_func.save(path, dataset+'_'+self.model_name)
    def load_emb(self, path, dataset):
        self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
        self.relation_emb.load(path, dataset+'_'+self.model_name+'_relation')
-        self.score_func.load(path, dataset)
+        self.score_func.load(path, dataset+'_'+self.model_name)
    def reset_parameters(self):
        self.entity_emb.init(self.emb_init)
@@ -96,6 +102,8 @@ class KEModel(object):
                tail_ids = to_device(tail_ids, gpu_id)
            tail = pos_g.ndata['emb'][tail_ids]
            rel = pos_g.edata['emb']
+            neg_head, tail = self.head_neg_prepare(pos_g.edata['id'], num_chunks, neg_head, tail, gpu_id, trace)
            neg_score = self.head_neg_score(neg_head, rel, tail,
                                            num_chunks, chunk_size, neg_sample_size)
        else:
@@ -106,6 +114,8 @@ class KEModel(object):
                head_ids = to_device(head_ids, gpu_id)
            head = pos_g.ndata['emb'][head_ids]
            rel = pos_g.edata['emb']
+            head, neg_tail = self.tail_neg_prepare(pos_g.edata['id'], num_chunks, head, neg_tail, gpu_id, trace)
            neg_score = self.tail_neg_score(head, rel, neg_tail,
                                            num_chunks, chunk_size, neg_sample_size)
@@ -115,6 +125,8 @@ class KEModel(object):
        pos_g.ndata['emb'] = self.entity_emb(pos_g.ndata['id'], gpu_id, False)
        pos_g.edata['emb'] = self.relation_emb(pos_g.edata['id'], gpu_id, False)
+        self.score_func.prepare(pos_g, gpu_id, False)
        batch_size = pos_g.number_of_edges()
        pos_scores = self.predict_score(pos_g)
        pos_scores = reshape(logsigmoid(pos_scores), batch_size, -1)
@@ -148,6 +160,8 @@ class KEModel(object):
        pos_g.ndata['emb'] = self.entity_emb(pos_g.ndata['id'], gpu_id, True)
        pos_g.edata['emb'] = self.relation_emb(pos_g.edata['id'], gpu_id, True)
+        self.score_func.prepare(pos_g, gpu_id, True)
        pos_score = self.predict_score(pos_g)
        pos_score = logsigmoid(pos_score)
        if gpu_id >= 0:
@@ -194,3 +208,4 @@ class KEModel(object):
    def update(self):
        self.entity_emb.update()
        self.relation_emb.update()
+        self.score_func.update()
--- a/apps/kg/models/mxnet/score_fun.py
+++ b/apps/kg/models/mxnet/score_fun.py
@@ -15,6 +15,17 @@ class TransEScore(nn.Block):
        score = head + rel - tail
        return {'score': self.gamma - nd.norm(score, ord=1, axis=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -46,6 +57,125 @@ class TransEScore(nn.Block):
                return gamma - nd.norm(heads - tails, ord=1, axis=-1)
            return fn
+class TransRScore(nn.Block):
+    def __init__(self, gamma, projection_emb, relation_dim, entity_dim):
+        super(TransRScore, self).__init__()
+        self.gamma = gamma
+        self.projection_emb = projection_emb
+        self.relation_dim = relation_dim
+        self.entity_dim = entity_dim
+    def edge_func(self, edges):
+        head = edges.data['head_emb']
+        tail = edges.data['tail_emb']
+        rel = edges.data['emb']
+        score = head + rel - tail
+        return {'score': self.gamma - nd.norm(score, ord=1, axis=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        head_ids, tail_ids = g.all_edges(order='eid')
+        projection = self.projection_emb(g.edata['id'], gpu_id, trace)
+        projection = projection.reshape(-1, self.entity_dim, self.relation_dim)
+        head_emb = g.ndata['emb'][head_ids.as_in_context(g.ndata['emb'].context)].expand_dims(axis=-2)
+        tail_emb = g.ndata['emb'][tail_ids.as_in_context(g.ndata['emb'].context)].expand_dims(axis=-2)
+        g.edata['head_emb'] = nd.batch_dot(head_emb, projection).squeeze()
+        g.edata['tail_emb'] = nd.batch_dot(tail_emb, projection).squeeze()
+    def create_neg_prepare(self, neg_head):
+        if neg_head:
+            def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+                # pos node, project to its relation
+                projection = self.projection_emb(rel_id, gpu_id, trace)
+                projection = projection.reshape(-1, self.entity_dim, self.relation_dim)
+                tail = tail.reshape(-1, 1, self.entity_dim)
+                tail = nd.batch_dot(tail, projection)
+                tail = tail.reshape(num_chunks, -1, self.relation_dim)
+                # neg node, each project to all relations
+                projection = projection.reshape(num_chunks, -1, self.entity_dim, self.relation_dim)
+                head = head.reshape(num_chunks, -1, 1, self.entity_dim)
+                num_rels = projection.shape[1]
+                num_nnodes = head.shape[1]
+                heads = []
+                for i in range(num_chunks):
+                    head_negs = []
+                    for j in range(num_nnodes):
+                        head_neg = head[i][j]
+                        head_neg = head_neg.reshape(1, 1, self.entity_dim)
+                        head_neg = nd.broadcast_axis(head_neg, axis=0, size=num_rels)
+                        head_neg = nd.batch_dot(head_neg, projection[i])
+                        head_neg = head_neg.squeeze(axis=1)
+                        head_negs.append(head_neg)
+                    head_negs = nd.stack(*head_negs, axis=1)
+                    heads.append(head_negs)
+                head = nd.stack(*heads)
+                return head, tail
+            return fn
+        else:
+            def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+                # pos node, project to its relation
+                projection = self.projection_emb(rel_id, gpu_id, trace)
+                projection = projection.reshape(-1, self.entity_dim, self.relation_dim)
+                head = head.reshape(-1, 1, self.entity_dim)
+                head = nd.batch_dot(head, projection).squeeze()
+                head = head.reshape(num_chunks, -1, self.relation_dim)
+                projection = projection.reshape(num_chunks, -1, self.entity_dim, self.relation_dim)
+                tail = tail.reshape(num_chunks, -1, 1, self.entity_dim)
+                num_rels = projection.shape[1]
+                num_nnodes = tail.shape[1]
+                tails = []
+                for i in range(num_chunks):
+                    tail_negs = []
+                    for j in range(num_nnodes):
+                        tail_neg = tail[i][j]
+                        tail_neg = tail_neg.reshape(1, 1, self.entity_dim)
+                        tail_neg = nd.broadcast_axis(tail_neg, axis=0, size=num_rels)
+                        tail_neg = nd.batch_dot(tail_neg, projection[i])
+                        tail_neg = tail_neg.squeeze(axis=1)
+                        tail_negs.append(tail_neg)
+                    tail_negs = nd.stack(*tail_negs, axis=1)
+                    tails.append(tail_negs)
+                tail = nd.stack(*tails)
+                return head, tail
+            return fn
+    def forward(self, g):
+        g.apply_edges(lambda edges: self.edge_func(edges))
+    def reset_parameters(self):
+        self.projection_emb.init(1.0)
+    def update(self):
+        self.projection_emb.update()
+    def save(self, path, name):
+        self.projection_emb.save(path, name+'projection')
+    def load(self, path, name):
+        self.projection_emb.load(path, name+'projection')
+    def create_neg(self, neg_head):
+        gamma = self.gamma
+        if neg_head:
+            def fn(heads, relations, tails, num_chunks, chunk_size, neg_sample_size):
+                relations = relations.reshape(num_chunks, -1, self.relation_dim)
+                tails = tails - relations
+                tails = tails.reshape(num_chunks, -1, 1, self.relation_dim)
+                score = heads - tails
+                return gamma - nd.norm(score, ord=1, axis=-1)
+            return fn
+        else:
+            def fn(heads, relations, tails, num_chunks, chunk_size, neg_sample_size):
+                relations = relations.reshape(num_chunks, -1, self.relation_dim)
+                heads = heads - relations
+                heads = heads.reshape(num_chunks, -1, 1, self.relation_dim)
+                score = heads - tails
+                return gamma - nd.norm(score, ord=1, axis=-1)
+            return fn
 class DistMultScore(nn.Block):
    def __init__(self):
        super(DistMultScore, self).__init__()
@@ -58,6 +188,17 @@ class DistMultScore(nn.Block):
        # TODO: check if there exists minus sign and if gamma should be used here(jin)
        return {'score': nd.sum(score, axis=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -104,6 +245,17 @@ class ComplExScore(nn.Block):
        # TODO: check if there exists minus sign and if gamma should be used here(jin)
        return {'score': nd.sum(score, -1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -161,6 +313,17 @@ class RESCALScore(nn.Block):
        return {'score': mx.nd.sum(score, -1)}
        # return {'score': self.gamma - th.norm(score, p=1, dim=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass

--- a/apps/kg/models/pytorch/score_fun.py
+++ b/apps/kg/models/pytorch/score_fun.py
@@ -3,6 +3,8 @@ import torch.nn as nn
 import torch.nn.functional as functional
 import torch.nn.init as INIT
+from .tensor_models import ExternalEmbedding
 class TransEScore(nn.Module):
    def __init__(self, gamma):
        super(TransEScore, self).__init__()
@@ -15,9 +17,20 @@ class TransEScore(nn.Module):
        score = head + rel - tail
        return {'score': self.gamma - th.norm(score, p=1, dim=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
    def forward(self, g):
        g.apply_edges(lambda edges: self.edge_func(edges))
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -46,6 +59,94 @@ class TransEScore(nn.Module):
                return gamma - th.cdist(heads, tails, p=1)
            return fn
+class TransRScore(nn.Module):
+    def __init__(self, gamma, projection_emb, relation_dim, entity_dim):
+        super(TransRScore, self).__init__()
+        self.gamma = gamma
+        self.projection_emb = projection_emb
+        self.relation_dim = relation_dim
+        self.entity_dim = entity_dim
+    def edge_func(self, edges):
+        head = edges.data['head_emb']
+        tail = edges.data['tail_emb']
+        rel = edges.data['emb']
+        score = head + rel - tail
+        return {'score': self.gamma - th.norm(score, p=1, dim=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        head_ids, tail_ids = g.all_edges(order='eid')
+        projection = self.projection_emb(g.edata['id'], gpu_id, trace)
+        projection = projection.reshape(-1, self.entity_dim, self.relation_dim)
+        g.edata['head_emb'] = th.einsum('ab,abc->ac', g.ndata['emb'][head_ids], projection)
+        g.edata['tail_emb'] = th.einsum('ab,abc->ac', g.ndata['emb'][tail_ids], projection)
+    def create_neg_prepare(self, neg_head):
+        if neg_head:
+            def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+                # pos node, project to its relation
+                projection = self.projection_emb(rel_id, gpu_id, trace)
+                projection = projection.reshape(num_chunks, -1, self.entity_dim, self.relation_dim)
+                tail = tail.reshape(num_chunks, -1, 1, self.entity_dim)
+                tail = th.matmul(tail, projection)
+                tail = tail.reshape(num_chunks, -1, self.relation_dim)
+                # neg node, each project to all relations
+                head = head.reshape(num_chunks, 1, -1, self.entity_dim)
+                # (num_chunks, num_rel, num_neg_nodes, rel_dim)
+                head = th.matmul(head, projection)
+                return head, tail
+            return fn
+        else:
+            def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+                # pos node, project to its relation
+                projection = self.projection_emb(rel_id, gpu_id, trace)
+                projection = projection.reshape(num_chunks, -1, self.entity_dim, self.relation_dim)
+                head = head.reshape(num_chunks, -1, 1, self.entity_dim)
+                head = th.matmul(head, projection)
+                head = head.reshape(num_chunks, -1, self.relation_dim)
+                # neg node, each project to all relations
+                tail = tail.reshape(num_chunks, 1, -1, self.entity_dim)
+                # (num_chunks, num_rel, num_neg_nodes, rel_dim)
+                tail = th.matmul(tail, projection)
+                return head, tail
+            return fn
+    def forward(self, g):
+        g.apply_edges(lambda edges: self.edge_func(edges))
+    def reset_parameters(self):
+        self.projection_emb.init(1.0)
+    def update(self):
+        self.projection_emb.update()
+    def save(self, path, name):
+        self.projection_emb.save(path, name+'projection')
+    def load(self, path, name):
+        self.projection_emb.load(path, name+'projection')
+    def create_neg(self, neg_head):
+        gamma = self.gamma
+        if neg_head:
+            def fn(heads, relations, tails, num_chunks, chunk_size, neg_sample_size):
+                relations = relations.reshape(num_chunks, -1, self.relation_dim)
+                tails = tails - relations
+                tails = tails.reshape(num_chunks, -1, 1, self.relation_dim)
+                score = heads - tails
+                return gamma - th.norm(score, p=1, dim=-1)
+            return fn
+        else:
+            def fn(heads, relations, tails, num_chunks, chunk_size, neg_sample_size):
+                relations = relations.reshape(num_chunks, -1, self.relation_dim)
+                heads = heads - relations
+                heads = heads.reshape(num_chunks, -1, 1, self.relation_dim)
+                score = heads - tails
+                return gamma - th.norm(score, p=1, dim=-1)
+            return fn
 class DistMultScore(nn.Module):
    def __init__(self):
        super(DistMultScore, self).__init__()
@@ -58,6 +159,17 @@ class DistMultScore(nn.Module):
        # TODO: check if there exists minus sign and if gamma should be used here(jin)
        return {'score': th.sum(score, dim=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -104,6 +216,17 @@ class ComplExScore(nn.Module):
        # TODO: check if there exists minus sign and if gamma should be used here(jin)
        return {'score': th.sum(score, -1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass
@@ -164,6 +287,17 @@ class RESCALScore(nn.Module):
        return {'score': th.sum(score, dim=-1)}
        # return {'score': self.gamma - th.norm(score, p=1, dim=-1)}
+    def prepare(self, g, gpu_id, trace=False):
+        pass
+    def create_neg_prepare(self, neg_head):
+        def fn(rel_id, num_chunks, head, tail, gpu_id, trace=False):
+            return head, tail
+        return fn
+    def update(self):
+        pass
    def reset_parameters(self):
        pass

--- a/apps/kg/tests/test_score.py
+++ b/apps/kg/tests/test_score.py
@@ -7,12 +7,28 @@ import dgl
 backend = os.environ.get('DGLBACKEND')
 if backend.lower() == 'mxnet':
+    import mxnet as mx
+    mx.random.seed(42)
+    np.random.seed(42)
    from models.mxnet.score_fun import *
+    from models.mxnet.tensor_models import ExternalEmbedding
 else:
+    import torch as th
+    th.manual_seed(42)
+    np.random.seed(42)
    from models.pytorch.score_fun import *
+    from models.pytorch.tensor_models import ExternalEmbedding
 from models.general_models import KEModel
 from dataloader.sampler import create_neg_subgraph
+class dotdict(dict):
+    """dot.notation access to dictionary attributes"""
+    __getattr__ = dict.get
+    __setattr__ = dict.__setitem__
+    __delattr__ = dict.__delitem__
 def generate_rand_graph(n, func_name):
    arr = (sp.sparse.random(n, n, density=0.1, format='coo') != 0).astype(np.int64)
    g = dgl.DGLGraph(arr, readonly=True)
@@ -24,24 +40,42 @@ def generate_rand_graph(n, func_name):
    g.ndata['id'] = F.arange(0, g.number_of_nodes())
    rel_ids = np.random.randint(0, num_rels, g.number_of_edges(), dtype=np.int64)
    g.edata['id'] = F.tensor(rel_ids, F.int64)
-    return g, entity_emb, rel_emb
-ke_score_funcs = {'TransE': TransEScore(12.0),
+    # TransR have additional projection_emb
-                  'DistMult': DistMultScore(),
+    if (func_name == 'TransR'):
-                  'ComplEx': ComplExScore(),
+        args = {'gpu':-1, 'lr':0.1}
-                  'RESCAL': RESCALScore(10, 10)}
+        args = dotdict(args)
+        projection_emb = ExternalEmbedding(args, 10, 10 * 10, F.cpu())
+        return g, entity_emb, rel_emb, (12.0, projection_emb, 10, 10)
+    elif (func_name == 'TransE'):
+        return g, entity_emb, rel_emb, (12.0)
+    elif (func_name == 'RESCAL'):
+        return g, entity_emb, rel_emb, (10, 10)
+    else:
+        return g, entity_emb, rel_emb, None
+ke_score_funcs = {'TransE': TransEScore,
+                  'DistMult': DistMultScore,
+                  'ComplEx': ComplExScore,
+                  'RESCAL': RESCALScore,
+                  'TransR': TransRScore}
 class BaseKEModel:
    def __init__(self, score_func, entity_emb, rel_emb):
        self.score_func = score_func
        self.head_neg_score = self.score_func.create_neg(True)
        self.tail_neg_score = self.score_func.create_neg(False)
+        self.head_neg_prepare = self.score_func.create_neg_prepare(True)
+        self.tail_neg_prepare = self.score_func.create_neg_prepare(False)
        self.entity_emb = entity_emb
        self.rel_emb = rel_emb
+        # init score_func specific data if needed
+        self.score_func.reset_parameters()
    def predict_score(self, g):
        g.ndata['emb'] = self.entity_emb[g.ndata['id']]
        g.edata['emb'] = self.rel_emb[g.edata['id']]
+        self.score_func.prepare(g, -1, False)
        self.score_func(g)
        return g.edata['score']
@@ -59,6 +93,8 @@ class BaseKEModel:
            _, tail_ids = pos_g.all_edges(order='eid')
            tail = pos_g.ndata['emb'][tail_ids]
            rel = pos_g.edata['emb']
+            neg_head, tail = self.head_neg_prepare(pos_g.edata['id'], num_chunks, neg_head, tail, -1, False)
            neg_score = self.head_neg_score(neg_head, rel, tail,
                                            num_chunks, chunk_size, neg_sample_size)
        else:
@@ -67,6 +103,8 @@ class BaseKEModel:
            head_ids, _ = pos_g.all_edges(order='eid')
            head = pos_g.ndata['emb'][head_ids]
            rel = pos_g.edata['emb']
+            head, neg_tail = self.tail_neg_prepare(pos_g.edata['id'], num_chunks, head, neg_tail, -1, False)
            neg_score = self.tail_neg_score(head, rel, neg_tail,
                                            num_chunks, chunk_size, neg_sample_size)
@@ -75,9 +113,15 @@ class BaseKEModel:
 def check_score_func(func_name):
    batch_size = 10
    neg_sample_size = 10
-    g, entity_emb, rel_emb = generate_rand_graph(100, func_name)
+    g, entity_emb, rel_emb, args = generate_rand_graph(100, func_name)
    hidden_dim = entity_emb.shape[1]
    ke_score_func = ke_score_funcs[func_name]
+    if args is None:
+        ke_score_func = ke_score_func()
+    elif type(args) is tuple:
+        ke_score_func = ke_score_func(*list(args))
+    else:
+        ke_score_func = ke_score_func(args)
    model = BaseKEModel(ke_score_func, entity_emb, rel_emb)
    EdgeSampler = getattr(dgl.contrib.sampling, 'EdgeSampler')
@@ -99,9 +143,24 @@ def check_score_func(func_name):
        np.testing.assert_allclose(F.asnumpy(score1), F.asnumpy(score2),
                                   rtol=1e-5, atol=1e-5)
-def test_score_func():
+def test_score_func_transe():
-    for key in ke_score_funcs:
+    check_score_func('TransE')
-        check_score_func(key)
+def test_score_func_distmult():
+    check_score_func('DistMult')
+def test_score_func_complex():
+    check_score_func('ComplEx')
+def test_score_func_rescal():
+    check_score_func('RESCAL')
+def test_score_func_transr():
+    check_score_func('TransR')
 if __name__ == '__main__':
-    test_score_func()
+    test_score_func_transe()
+    test_score_func_distmult()
+    test_score_func_complex()
+    test_score_func_rescal()
+    test_score_func_transr()