Commit e557ed89 authored by Minjie Wang's avatar Minjie Wang Committed by Lingfan Yu
Browse files

[Doc][Model] example folder doc and model summary (#259)

* update example readme

* mx example readme

* add results in readme

* mx rgcn readme
parent 899d125b
# Model Examples using DGL (w/ MXNet backend)
Each model is hosted in their own folders. Please read their README.md to see how to
run them.
To understand step-by-step how these models are implemented in DGL. Check out our
[tutorials](https://docs.dgl.ai/tutorials/models/index.html)
...@@ -9,17 +9,17 @@ Two extra python packages are needed for this example: ...@@ -9,17 +9,17 @@ Two extra python packages are needed for this example:
Example code was tested with rdflib 4.2.2 and pandas 0.23.4 Example code was tested with rdflib 4.2.2 and pandas 0.23.4
### Entity Classification ### Entity Classification
AIFB: AIFB: accuracy 97.22% (DGL), 95.83% (paper)
``` ```
DGLBACKEND=mxnet python entity_classify.py -d aifb --testing --gpu 0 DGLBACKEND=mxnet python entity_classify.py -d aifb --testing --gpu 0
``` ```
MUTAG: MUTAG: accuracy 76.47% (DGL), 73.23% (paper)
``` ```
DGLBACKEND=mxnet python entity_classify.py -d mutag --l2norm 5e-4 --n-bases 40 --testing --gpu 0 DGLBACKEND=mxnet python entity_classify.py -d mutag --l2norm 5e-4 --n-bases 40 --testing --gpu 0
``` ```
BGS: BGS: accuracy 79.31% (DGL, n-basese=20, OOM when >20), 83.10% (paper)
``` ```
DGLBACKEND=mxnet python entity_classify.py -d bgs --l2norm 5e-4 --n-bases 20 --testing --gpu 0 --relabel DGLBACKEND=mxnet python entity_classify.py -d bgs --l2norm 5e-4 --n-bases 20 --testing --gpu 0 --relabel
``` ```
# DGL example models # Model Examples using DGL (w/ Pytorch backend)
## Dataset Each model is hosted in their own folders. Please read their README.md to see how to
gat.py and gcn.py use real dataset, please download citeseer/cora/pubmed dataset from: run them.
https://github.com/tkipf/gcn/tree/master/gcn/data
dataset.py (adapted from tkipf/gcn) assumes that there is a "data" folder under current directory To understand step-by-step how these models are implemented in DGL. Check out our
[tutorials](https://docs.dgl.ai/tutorials/models/index.html)
## Model summary
Here is a summary of the model accuracy and training speed. Our testbed is Amazon EC2 p3.2x instance (w/ V100 GPU).
| Model | Reported <br> Accuracy | DGL <br> Accuracy | Author's training speed (epoch time) | DGL speed (epoch time) | Improvement |
| ----- | ----------------- | ------------ | ------------------------------------ | ---------------------- | ----------- |
| GCN | 81.5% | 81.0% | 0.0051s (TF) | 0.0042s | 1.17x |
| TreeLSTM | 51.0% | 51.72% | 14.02s (DyNet) | 3.18s | 4.3x |
| R-GCN <br> (classification) | 73.23% | 73.53% | 0.2853s (Theano) | 0.0273s | 10.4x |
| R-GCN <br> (link prediction) | 0.158 | 0.151 | 2.204s (TF) | 0.633s | 3.5x |
| JTNN | 96.44% | 96.44% | 1826s (Pytorch) | 743s | 2.5x |
| LGNN | 94% | 94% | n/a | 1.45s | n/a |
| DGMG | 84% | 90% | n/a | 1 hr | n/a |
...@@ -5,52 +5,20 @@ Graph Convolutional Networks (GCN) ...@@ -5,52 +5,20 @@ Graph Convolutional Networks (GCN)
- Author's code repo: [https://github.com/tkipf/gcn](https://github.com/tkipf/gcn). Note that the original code is - Author's code repo: [https://github.com/tkipf/gcn](https://github.com/tkipf/gcn). Note that the original code is
implemented with Tensorflow for the paper. implemented with Tensorflow for the paper.
The folder contains two different implementations using DGL. Codes
-----
The folder contains two implementations of GCN. `gcn_batch.py` uses user-defined
message and reduce functions. `gcn_spmv.py` uses DGL's builtin functions so
SPMV optimization could be applied.
Batched GCN (gcn.py) Results
----------- -------
Defining the model on only one node and edge makes it hard to fully utilize GPUs. As a result, we allow users to define model on a *batch of* nodes and edges.
* The message function `gcn_msg` computes the message for a batch of edges. Here, the `src` argument is the batched representation of the source endpoints of the edges. The function simply returns the source node representations. Run with following (available dataset: "cora", "citeseer", "pubmed")
```python ```bash
def gcn_msg(edge): python gcn_spmv.py --dataset cora --gpu 0
# edge.src contains the node data on the source node.
# It's a dictionary. edge.src['h'] is a tensor of shape (B, D).
# B is the number of edges being batched.
return {'m': edge.src['h']}
```
* The reduce function `gcn_reduce` also accumulates messages for a batch of nodes. We batch the messages on the second dimension for the `msgs` argument,
which for example can correspond to the neighbors of the nodes:
```python
def gcn_reduce(node):
# node.mailbox contains messages from `gcn_msg`.
# node.mailbox['m'] is a tensor of shape (B, deg, D). B is the number of nodes in the batch;
# deg is the number of messages; D is the message tensor dimension. DGL gaurantees
# that all the nodes in a batch have the same in-degrees (through "degree-bucketing").
# Reduce on the second dimension is equal to sum up all the in-coming messages.
return {'accum': mx.nd.sum(node.mailbox['m'], 1)}
```
* The update function `NodeUpdateModule` computes the new new node representation `h` using non-linear transformation on the reduced messages.
```python
class NodeUpdateModule(gluon.Block):
def __init__(self, out_feats, activation=None):
super(NodeUpdateModule, self).__init__()
self.linear = gluon.nn.Dense(out_feats, activation=activation)
def forward(self, node):
accum = self.linear(node.data['accum'])
return {'h': mx.nd.concat(node.data['h'], accum, dim=1)}
```
After defining the functions on each node/edge, the message passing is triggered by calling `update_all` on the DGLGraph object (in GCN module).
```python
self.g.update_all(gcn_msg, gcn_reduce, layer)`
``` ```
Batched GCN with spMV optimization (gcn_spmv.py) * cora: ~0.810 (0.79-0.83) (paper: 0.815)
----------- * citeseer: 0.707 (paper: 0.703)
Batched computation is much more efficient than naive vertex-centric approach, but is still not ideal. For example, the batched message function needs to look up source node data and save it on edges. Such kind of lookups is very common and incurs extra memory copy operations. In fact, the message and reduce phase of GCN model can be fused into one sparse-matrix-vector multiplication (spMV). Therefore, DGL provides many built-in message/reduce functions so we can figure out the chance of optimization. In gcn_spmv.py, user only needs to write update module and trigger the message passing as follows: * pubmed: 0.792 (paper: 0.790)
```python
self.g.update_all(fn.copy_src(src='h', out='m'), fn.sum(msg='m', out='h'), layer)
```
Here, `'fn.copy_src'` and `'fn.sum'` are the builtin message and reduce functions that perform the same operations as `gcn_msg` and `gcn_reduce` in gcn.py.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 9 13:34:38 2018
@author: ivabruge
GeniePath: Graph Neural Networks with Adaptive Receptive Paths
Paper: https://arxiv.org/abs/1802.00910
this model uses an LSTM on the node reductions of the message-passing step
we store the network states at the graph node, since the LSTM variables are not transmitted
"""
from dgl.graph import DGLGraph
import torch
import torch.nn as nn
import torch.nn.functional as F
import argparse
from dataset import load_data, preprocess_features
class NodeReduceModule(nn.Module):
def __init__(self, input_dim, num_hidden, num_heads=3, input_dropout=None,
attention_dropout=None, act=lambda x: F.softmax(F.leaky_relu(x), dim=0)):
super(NodeReduceModule, self).__init__()
self.num_heads = num_heads
self.input_dropout = input_dropout
self.attention_dropout = attention_dropout
self.act = act
self.fc = nn.ModuleList(
[nn.Linear(input_dim, num_hidden, bias=False)
for _ in range(num_heads)])
self.attention = nn.ModuleList(
[nn.Linear(num_hidden * 2, 1, bias=False) for _ in range(num_heads)])
def forward(self, msgs):
src, dst = zip(*msgs)
hu = torch.cat(src, dim=0) # neighbor repr
hv = torch.cat(dst, dim=0)
msgs_repr = []
# iterate for each head
for i in range(self.num_heads):
# calc W*hself and W*hneigh
hvv = self.fc[i](hv)
huu = self.fc[i](hu)
# calculate W*hself||W*hneigh
h = torch.cat((hvv, huu), dim=1)
a = self.act(self.attention[i](h))
if self.attention_dropout is not None:
a = F.dropout(a, self.attention_dropout)
if self.input_dropout is not None:
hvv = F.dropout(hvv, self.input_dropout)
h = torch.sum(a * hvv, 0, keepdim=True)
msgs_repr.append(h)
return msgs_repr
class NodeUpdateModule(nn.Module):
def __init__(self, residual, fc, act, aggregator, lstm_size=0):
super(NodeUpdateModule, self).__init__()
self.residual = residual
self.fc = fc
self.act = act
self.aggregator = aggregator
if lstm_size:
self.lstm = nn.LSTM(input_size=lstm_size, hidden_size=lstm_size, num_layers=1)
else:
self.lstm=None
#print(fc[0].out_features)
def forward(self, node, msgs_repr):
# apply residual connection and activation for each head
for i in range(len(msgs_repr)):
if self.residual:
h = self.fc[i](node['h'])
msgs_repr[i] = msgs_repr[i] + h
if self.act is not None:
msgs_repr[i] = self.act(msgs_repr[i])
# aggregate multi-head results
h = self.aggregator(msgs_repr)
#print(h.shape)
if self.lstm is not None:
c0 = torch.zeros(h.shape)
if node['c'] is None:
c0 = torch.zeros(h.shape)
else:
c0 = node['c']
if node['h_i'] is None:
h0 = torch.zeros(h.shape)
else:
h0 = node['h_i']
#add dimension to handle sequential (create sequence of length 1)
h, (h_i, c) = self.lstm(h.unsqueeze(0), (h0.unsqueeze(0), c0.unsqueeze(0)))
#remove sequential dim
h = torch.squeeze(h, 0)
h_i = torch.squeeze(h, 0)
c = torch.squeeze(c, 0)
return {'h': h, 'c':c, 'h_i':h_i}
else:
return {'h': h, 'c':None, 'h_i':None}
class GeniePath(nn.Module):
def __init__(self, num_layers, in_dim, num_hidden, num_classes, num_heads,
activation, input_dropout, attention_dropout, use_residual=False ):
super(GeniePath, self).__init__()
self.input_dropout = input_dropout
self.reduce_layers = nn.ModuleList()
self.update_layers = nn.ModuleList()
# hidden layers
for i in range(num_layers):
if i == 0:
last_dim = in_dim
residual = False
else:
last_dim = num_hidden * num_heads # because of concat heads
residual = use_residual
self.reduce_layers.append(
NodeReduceModule(last_dim, num_hidden, num_heads, input_dropout,
attention_dropout))
self.update_layers.append(
NodeUpdateModule(residual, self.reduce_layers[-1].fc, activation,
lambda x: torch.cat(x, 1), num_hidden * num_heads))
# projection
self.reduce_layers.append(
NodeReduceModule(num_hidden * num_heads, num_classes, 1, input_dropout,
attention_dropout))
self.update_layers.append(
NodeUpdateModule(False, self.reduce_layers[-1].fc, None, sum))
def forward(self, g):
g.register_message_func(lambda src, dst, edge: (src['h'], dst['h']))
for reduce_func, update_func in zip(self.reduce_layers, self.update_layers):
# apply dropout
if self.input_dropout is not None:
# TODO (lingfan): use batched dropout once we have better api
# for global manipulation
for n in g.nodes():
g.node[n]['h'] = F.dropout(g.node[n]['h'], p=self.input_dropout)
g.node[n]['c'] = None
g.node[n]['h_i'] = None
g.register_reduce_func(reduce_func)
g.register_update_func(update_func)
g.update_all()
logits = [g.node[n]['h'] for n in g.nodes()]
logits = torch.cat(logits, dim=0)
return logits
#train on graph g with features, and target labels. Accepts a loss function and an optimizer function which implements optimizer.step()
def train(self, g, features, labels, epochs, loss_f=torch.nn.NLLLoss, loss_params={}, optimizer=torch.optim.Adam, optimizer_parameters=None, lr=0.001, ignore=[0], quiet=False):
labels = torch.LongTensor(labels)
_, labels = torch.max(labels, dim=1)
# convert labels and masks to tensor
if optimizer_parameters is None:
optimizer_parameters = self.parameters()
#instantiate optimizer on given params
optimizer_f = optimizer(optimizer_parameters, lr)
for epoch in range(args.epochs):
# reset grad
optimizer_f.zero_grad()
# reset graph states
for n in g.nodes():
g.node[n]['h'] = torch.FloatTensor(features[n].toarray())
# forward
logits = self.forward(g)
#intantiate loss on passed parameters (e.g. class weight params)
loss = loss_f(**loss_params)
#trim null labels
idx = [i for i, a in enumerate(labels) if a not in ignore]
logits = logits[idx, :]
labels = labels[idx]
out = loss(logits, labels)
if not quiet:
print("epoch {} loss: {}".format(epoch, out))
out.backward()
optimizer_f.step()
def main(args):
# dropout parameters
input_dropout = args.idrop
attention_dropout = args.adrop
# load and preprocess dataset
adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(args.dataset)
features = preprocess_features(features)
# initialize graph
g = DGLGraph(adj)
# create model
model = GeniePath(args.num_layers,
features.shape[1],
args.num_hidden,
y_train.shape[1],
args.num_heads,
F.elu,
input_dropout,
attention_dropout,
args.residual)
model.train(g, features, y_train, epochs=args.epochs)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GAT')
parser.add_argument("--dataset", type=str, required=True,
help="dataset name")
parser.add_argument("--epochs", type=int, default=10,
help="training epoch")
parser.add_argument("--num-heads", type=int, default=3,
help="number of attentional heads to use")
parser.add_argument("--num-layers", type=int, default=1,
help="number of hidden layers")
parser.add_argument("--num-hidden", type=int, default=8,
help="size of hidden units")
parser.add_argument("--residual", action="store_true",
help="use residual connection")
parser.add_argument("--lr", type=float, default=0.001,
help="learning rate")
parser.add_argument("--idrop", type=float, default=0.2,
help="Input dropout")
parser.add_argument("--adrop", type=float, default=0.2,
help="attention dropout")
args = parser.parse_args()
print(args)
main(args)
\ No newline at end of file
"""Molecular GCN model proposed by Kearnes et al. (2016).
We use the description from "Neural Message Passing for Quantum Chemistry" Sec.2.
The model has an edge representation e_vw that is updated during message passing.
The message function is:
- M(h_v, h_w, e_vw) = e_vw
The update function is:
- U_v(h_v, m_v) = Affine(Affine(h_v) || m_v)
The edge update function is:
- U_e(e_vw, h_v, h_w) = Affine(ReLU(W_e || e_vw) || Affine(h_v || h_w))
"""
import torch as T
import torch.nn as nn
import torch.nn.functional as F
import dgl
class NodeUpdateModule(nn.Module):
def __init__(self, hv_dims):
self.net1 = nn.Sequential(
nn.Linear(hv_dims),
nn.ReLU()
)
self.net2 = nn.Sequential(
nn.Linear(hv_dims),
nn.ReLU()
)
def forward(self, node, msgs):
m = T.stack(msgs).mean(0)
new_h = self.net2(T.cat(self.net1(node['hv']), m))
return {'hv' : new_h}
class MessageModule(nn.Module):
def __init__(self):
pass
def forward(self, src, dst, edge):
return edge['he']
class EdgeUpdateModule(nn.Module):
def __init__(self, he_dims):
self.net1 = nn.Sequential(
nn.Linear(he_dims),
nn.ReLU()
)
self.net2 = nn.Sequential(
nn.Linear(he_dims),
nn.ReLU()
)
self.net3 = nn.Sequential(
nn.Linear(he_dims),
nn.ReLU()
)
def forward(self, src, dst, edge):
new_he = self.net1(src['hv']) + self.net2(dst['hv']) + self.net3(edge['he'])
return {'he' : new_he}
# TODO: we don't need this one anymore
class EdgeModule(nn.Module):
def __init__(self, he_dims):
# use a flag to trigger either message module or edge update module.
self.is_msg = True
self.msg_mod = MessageModule()
self.upd_mod = EdgeUpdateModule()
def forward(self, src, dst, edge):
if self.is_msg:
self.is_msg = not self.is_msg
return self.msg_mod(src, dst, edge)
else:
self.is_msg = not self.is_msg
return self.upd_mod(src, dst, edge)
def train(g):
# TODO(minjie): finish the complete training algorithm.
g = dgl.DGLGraph(g)
g.register_message_func(MessageModule())
g.register_edge_func(EdgeUpdateModule())
g.register_update_func(NodeUpdateModule())
# TODO(minjie): init hv and he
num_iter = 10
for i in range(num_iter):
# The first call triggers message function and update all the nodes.
g.update_all()
# The second sendall updates all the edge features.
# g.send_all()
...@@ -9,23 +9,23 @@ Two extra python packages are needed for this example: ...@@ -9,23 +9,23 @@ Two extra python packages are needed for this example:
Example code was tested with rdflib 4.2.2 and pandas 0.23.4 Example code was tested with rdflib 4.2.2 and pandas 0.23.4
### Entity Classification ### Entity Classification
AIFB: AIFB: accuracy 97.22% (DGL), 95.83% (paper)
``` ```
python3 entity_classify.py -d aifb --testing --gpu 0 python3 entity_classify.py -d aifb --testing --gpu 0
``` ```
MUTAG: MUTAG: accuracy 75% (DGL), 73.23% (paper)
``` ```
python3 entity_classify.py -d mutag --l2norm 5e-4 --n-bases 30 --testing --gpu 0 python3 entity_classify.py -d mutag --l2norm 5e-4 --n-bases 30 --testing --gpu 0
``` ```
BGS: BGS: accuracy 82.76% (DGL), 83.10% (paper)
``` ```
python3 entity_classify.py -d bgs --l2norm 5e-4 --n-bases 40 --testing --gpu 0 --relabel python3 entity_classify.py -d bgs --l2norm 5e-4 --n-bases 40 --testing --gpu 0 --relabel
``` ```
### Link Prediction ### Link Prediction
FB15k-237: FB15k-237: MRR 0.151 (DGL), 0.158 (paper)
``` ```
python3 link_predict.py -d FB15k-237 --gpu 0 python3 link_predict.py -d FB15k-237 --gpu 0
``` ```
...@@ -17,7 +17,3 @@ unzip glove.840B.300d.zip ...@@ -17,7 +17,3 @@ unzip glove.840B.300d.zip
``` ```
python train.py --gpu 0 python train.py --gpu 0
``` ```
## Speed Test
See https://docs.google.com/spreadsheets/d/1eCQrVn7g0uWriz63EbEDdes2ksMdKdlbWMyT8PSU4rc .
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment