Unverified Commit e9b624fe authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

Merge branch 'master' into dist_part

parents 8086d1ed a88e7f7e
# Accuracy across 10 runs: 0.7788 ± 0.002227 # Accuracy across 10 runs: 0.7788 ± 0.002227
version: 0.0.1 version: 0.0.2
pipeline_name: nodepred pipeline_name: nodepred
pipeline_mode: train pipeline_mode: train
device: cuda:0 device: cuda:0
......
# Accuracy across 10 runs: 0.7826 ± 0.004317 # Accuracy across 10 runs: 0.7826 ± 0.004317
version: 0.0.1 version: 0.0.2
pipeline_name: nodepred pipeline_name: nodepred
pipeline_mode: train pipeline_mode: train
device: cuda:0 device: cuda:0
......
# Accuracy across 10 runs: 0.7819 ± 0.003176 # Accuracy across 10 runs: 0.7819 ± 0.003176
version: 0.0.1 version: 0.0.2
pipeline_name: nodepred pipeline_name: nodepred
pipeline_mode: train pipeline_mode: train
device: cuda:0 device: cuda:0
......
...@@ -4,7 +4,7 @@ from setuptools import find_packages ...@@ -4,7 +4,7 @@ from setuptools import find_packages
from distutils.core import setup from distutils.core import setup
setup(name='dglgo', setup(name='dglgo',
version='0.0.1', version='0.0.2',
description='DGL', description='DGL',
author='DGL Team', author='DGL Team',
author_email='wmjlyjemaine@gmail.com', author_email='wmjlyjemaine@gmail.com',
......
version: 0.0.1 version: 0.0.2
pipeline_name: nodepred pipeline_name: nodepred
pipeline_mode: train pipeline_mode: train
device: cpu device: cpu
......
...@@ -132,6 +132,7 @@ under the ``dgl`` namespace. ...@@ -132,6 +132,7 @@ under the ``dgl`` namespace.
DGLGraph.add_self_loop DGLGraph.add_self_loop
DGLGraph.remove_self_loop DGLGraph.remove_self_loop
DGLGraph.to_simple DGLGraph.to_simple
DGLGraph.to_cugraph
DGLGraph.reorder_graph DGLGraph.reorder_graph
Adjacency and incidence matrix Adjacency and incidence matrix
......
...@@ -18,6 +18,7 @@ Operators for constructing :class:`DGLGraph` from raw data formats. ...@@ -18,6 +18,7 @@ Operators for constructing :class:`DGLGraph` from raw data formats.
graph graph
heterograph heterograph
from_cugraph
from_scipy from_scipy
from_networkx from_networkx
bipartite_from_scipy bipartite_from_scipy
...@@ -93,6 +94,7 @@ Operators for generating new graphs by manipulating the structure of the existin ...@@ -93,6 +94,7 @@ Operators for generating new graphs by manipulating the structure of the existin
to_bidirected to_bidirected
to_bidirected_stale to_bidirected_stale
to_block to_block
to_cugraph
to_double to_double
to_float to_float
to_half to_half
......
...@@ -24,11 +24,13 @@ API that is exposed to python is only a few lines of codes: ...@@ -24,11 +24,13 @@ API that is exposed to python is only a few lines of codes:
#include <dgl/runtime/packed_func.h> #include <dgl/runtime/packed_func.h>
#include <dgl/runtime/registry.h> #include <dgl/runtime/registry.h>
using namespace dgl::runtime;
DGL_REGISTER_GLOBAL("calculator.MyAdd") DGL_REGISTER_GLOBAL("calculator.MyAdd")
.set_body([] (DGLArgs args, DGLRetValue* rv) { .set_body([] (DGLArgs args, DGLRetValue* rv) {
int a = args[0]; int a = args[0];
int b = args[1]; int b = args[1];
*rv = a * b; *rv = a + b;
}); });
Compile and build the library. On the python side, create a Compile and build the library. On the python side, create a
......
...@@ -60,7 +60,7 @@ Using CUDA UVA-based neighborhood sampling in DGL data loaders ...@@ -60,7 +60,7 @@ Using CUDA UVA-based neighborhood sampling in DGL data loaders
For the case where the graph is too large to fit onto the GPU memory, we introduce the For the case where the graph is too large to fit onto the GPU memory, we introduce the
CUDA UVA (Unified Virtual Addressing)-based sampling, in which GPUs perform the sampling CUDA UVA (Unified Virtual Addressing)-based sampling, in which GPUs perform the sampling
on the graph pinned on CPU memory via zero-copy access. on the graph pinned in CPU memory via zero-copy access.
You can enable UVA-based neighborhood sampling in DGL data loaders via: You can enable UVA-based neighborhood sampling in DGL data loaders via:
* Put the ``train_nid`` onto GPU. * Put the ``train_nid`` onto GPU.
...@@ -99,6 +99,38 @@ especially for multi-GPU training. ...@@ -99,6 +99,38 @@ especially for multi-GPU training.
Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details. Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details.
UVA and GPU support for PinSAGESampler/RandomWalkNeighborSampler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PinSAGESampler and RandomWalkNeighborSampler support UVA and GPU sampling.
You can enable them via:
* Pin the graph (for UVA sampling) or put the graph onto GPU (for GPU sampling).
* Put the ``train_nid`` onto GPU.
.. code:: python
g = dgl.heterograph({
('item', 'bought-by', 'user'): ([0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 2, 3, 2, 3]),
('user', 'bought', 'item'): ([0, 1, 0, 1, 2, 3, 2, 3], [0, 0, 1, 1, 2, 2, 3, 3])})
# UVA setup
# g.create_formats_()
# g.pin_memory_()
# GPU setup
device = torch.device('cuda:0')
g = g.to(device)
sampler1 = dgl.sampling.PinSAGESampler(g, 'item', 'user', 4, 0.5, 3, 2)
sampler2 = dgl.sampling.RandomWalkNeighborSampler(g, 4, 0.5, 3, 2, ['bought-by', 'bought'])
train_nid = torch.tensor([0, 2], dtype=g.idtype, device=device)
sampler1(train_nid)
sampler2(train_nid)
Using GPU-based neighbor sampling with DGL functions Using GPU-based neighbor sampling with DGL functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -106,8 +138,7 @@ You can build your own GPU sampling pipelines with the following functions that ...@@ -106,8 +138,7 @@ You can build your own GPU sampling pipelines with the following functions that
operating on GPU: operating on GPU:
* :func:`dgl.sampling.sample_neighbors` * :func:`dgl.sampling.sample_neighbors`
* :func:`dgl.sampling.random_walk`
* Only has support for uniform sampling; non-uniform sampling can only run on CPU.
Subgraph extraction ops: Subgraph extraction ops:
......
...@@ -2,59 +2,36 @@ ...@@ -2,59 +2,36 @@
Chapter 8: Mixed Precision Training Chapter 8: Mixed Precision Training
=================================== ===================================
DGL is compatible with `PyTorch's automatic mixed precision package DGL is compatible with the `PyTorch Automatic Mixed Precision (AMP) package
<https://pytorch.org/docs/stable/amp.html>`_ <https://pytorch.org/docs/stable/amp.html>`_
for mixed precision training, thus saving both training time and GPU memory for mixed precision training, thus saving both training time and GPU memory
consumption. To enable this feature, users need to install PyTorch 1.6+ with python 3.7+ and consumption. This feature requires DGL 0.9+.
build DGL from source file to support ``float16`` data type (this feature is
still in its beta stage and we do not provide official pre-built pip wheels).
Installation
------------
First download DGL's source code from GitHub and build the shared library
with flag ``USE_FP16=ON``.
.. code:: bash
git clone --recurse-submodules https://github.com/dmlc/dgl.git
cd dgl
mkdir build
cd build
cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
make -j
Then install the Python binding.
.. code:: bash
cd ../python
python setup.py install
Message-Passing with Half Precision Message-Passing with Half Precision
----------------------------------- -----------------------------------
DGL with fp16 support allows message-passing on ``float16`` features for both DGL allows message-passing on ``float16 (fp16)`` features for both
UDF(User Defined Function)s and built-in functions (e.g. ``dgl.function.sum``, UDFs (User Defined Functions) and built-in functions (e.g., ``dgl.function.sum``,
``dgl.function.copy_u``). ``dgl.function.copy_u``).
The following examples shows how to use DGL's message-passing API on half-precision The following example shows how to use DGL's message-passing APIs on half-precision
features: features:
>>> import torch >>> import torch
>>> import dgl >>> import dgl
>>> import dgl.function as fn >>> import dgl.function as fn
>>> g = dgl.rand_graph(30, 100).to(0) # Create a graph on GPU w/ 30 nodes and 100 edges. >>> dev = torch.device('cuda')
>>> g.ndata['h'] = torch.rand(30, 16).to(0).half() # Create fp16 node features. >>> g = dgl.rand_graph(30, 100).to(dev) # Create a graph on GPU w/ 30 nodes and 100 edges.
>>> g.edata['w'] = torch.rand(100, 1).to(0).half() # Create fp16 edge features. >>> g.ndata['h'] = torch.rand(30, 16).to(dev).half() # Create fp16 node features.
>>> g.edata['w'] = torch.rand(100, 1).to(dev).half() # Create fp16 edge features.
>>> # Use DGL's built-in functions for message passing on fp16 features. >>> # Use DGL's built-in functions for message passing on fp16 features.
>>> g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.sum('m', 'x')) >>> g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.sum('m', 'x'))
>>> g.ndata['x'][0] >>> g.ndata['x'].dtype
tensor([0.3391, 0.2208, 0.7163, 0.6655, 0.7031, 0.5854, 0.9404, 0.7720, 0.6562, torch.float16
0.4028, 0.6943, 0.5908, 0.9307, 0.5962, 0.7827, 0.5034],
device='cuda:0', dtype=torch.float16)
>>> g.apply_edges(fn.u_dot_v('h', 'x', 'hx')) >>> g.apply_edges(fn.u_dot_v('h', 'x', 'hx'))
>>> g.edata['hx'][0] >>> g.edata['hx'].dtype
tensor([5.4570], device='cuda:0', dtype=torch.float16) torch.float16
>>> # Use UDF(User Defined Functions) for message passing on fp16 features.
>>> # Use UDFs for message passing on fp16 features.
>>> def message(edges): >>> def message(edges):
... return {'m': edges.src['h'] * edges.data['w']} ... return {'m': edges.src['h'] * edges.data['w']}
... ...
...@@ -65,14 +42,11 @@ features: ...@@ -65,14 +42,11 @@ features:
... return {'hy': (edges.src['h'] * edges.dst['y']).sum(-1, keepdims=True)} ... return {'hy': (edges.src['h'] * edges.dst['y']).sum(-1, keepdims=True)}
... ...
>>> g.update_all(message, reduce) >>> g.update_all(message, reduce)
>>> g.ndata['y'][0] >>> g.ndata['y'].dtype
tensor([0.3394, 0.2209, 0.7168, 0.6655, 0.7026, 0.5854, 0.9404, 0.7720, 0.6562, torch.float16
0.4028, 0.6943, 0.5908, 0.9307, 0.5967, 0.7827, 0.5039],
device='cuda:0', dtype=torch.float16)
>>> g.apply_edges(dot) >>> g.apply_edges(dot)
>>> g.edata['hy'][0] >>> g.edata['hy'].dtype
tensor([5.4609], device='cuda:0', dtype=torch.float16) torch.float16
End-to-End Mixed Precision Training End-to-End Mixed Precision Training
----------------------------------- -----------------------------------
...@@ -80,33 +54,52 @@ DGL relies on PyTorch's AMP package for mixed precision training, ...@@ -80,33 +54,52 @@ DGL relies on PyTorch's AMP package for mixed precision training,
and the user experience is exactly and the user experience is exactly
the same as `PyTorch's <https://pytorch.org/docs/stable/notes/amp_examples.html>`_. the same as `PyTorch's <https://pytorch.org/docs/stable/notes/amp_examples.html>`_.
By wrapping the forward pass (including loss computation) of your GNN model with By wrapping the forward pass with ``torch.cuda.amp.autocast()``, PyTorch automatically
``torch.cuda.amp.autocast()``, PyTorch automatically selects the appropriate datatype selects the appropriate datatype for each op and tensor. Half precision tensors are memory
for each op and tensor. Half precision tensors are memory efficient, most operators efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.
on half precision tensors are faster as they leverage GPU's tensorcores.
.. code::
import torch.nn.functional as F
from torch.cuda.amp import autocast
def forward(g, feat, label, mask, model, use_fp16):
with autocast(enabled=use_fp16):
logit = model(g, feat)
loss = F.cross_entropy(logit[mask], label[mask])
return loss
Small Gradients in ``float16`` format have underflow problems (flush to zero).
PyTorch provides a ``GradScaler`` module to address this issue. It multiplies
the loss by a factor and invokes backward pass on the scaled loss to prevent
the underflow problem. It then unscales the computed gradients before the optimizer
updates the parameters. The scale factor is determined automatically.
.. code::
from torch.cuda.amp import GradScaler
Small Gradients in ``float16`` format have underflow problems (flush to zero), and scaler = GradScaler()
PyTorch provides a ``GradScaler`` module to address this issue. ``GradScaler`` multiplies
loss by a factor and invokes backward pass on scaled loss, and unscales graidents before def backward(scaler, loss, optimizer):
optimizers update the parameters, thus preventing the underflow problem. scaler.scale(loss).backward()
The scale factor is determined automatically. scaler.step(optimizer)
scaler.update()
Following is the training script of 3-layer GAT on Reddit dataset (w/ 114 million edges), The following example trains a 3-layer GAT on the Reddit dataset (w/ 114 million edges).
note the difference in codes when ``use_fp16`` is activated/not activated: Pay attention to the differences in the code when ``use_fp16`` is activated or not.
.. code:: .. code::
import torch import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F
from torch.cuda.amp import autocast, GradScaler
import dgl import dgl
from dgl.data import RedditDataset from dgl.data import RedditDataset
from dgl.nn import GATConv from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop
use_fp16 = True use_fp16 = True
class GAT(nn.Module): class GAT(nn.Module):
def __init__(self, def __init__(self,
in_feats, in_feats,
...@@ -129,48 +122,40 @@ note the difference in codes when ``use_fp16`` is activated/not activated: ...@@ -129,48 +122,40 @@ note the difference in codes when ``use_fp16`` is activated/not activated:
return h return h
# Data loading # Data loading
data = RedditDataset() transform = AddSelfLoop()
device = torch.device(0) data = RedditDataset(transform)
dev = torch.device('cuda')
g = data[0] g = data[0]
g = dgl.add_self_loop(g) g = g.int().to(dev)
g = g.int().to(device)
train_mask = g.ndata['train_mask'] train_mask = g.ndata['train_mask']
features = g.ndata['feat'] feat = g.ndata['feat']
labels = g.ndata['label'] label = g.ndata['label']
in_feats = features.shape[1]
in_feats = feat.shape[1]
n_hidden = 256 n_hidden = 256
n_classes = data.num_classes n_classes = data.num_classes
n_edges = g.number_of_edges()
heads = [1, 1, 1] heads = [1, 1, 1]
model = GAT(in_feats, n_hidden, n_classes, heads) model = GAT(in_feats, n_hidden, n_classes, heads)
model = model.to(device) model = model.to(dev)
model.train()
# Create optimizer # Create optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)
# Create gradient scaler
scaler = GradScaler()
for epoch in range(100): for epoch in range(100):
model.train()
optimizer.zero_grad() optimizer.zero_grad()
loss = forward(g, feat, label, train_mask, model, use_fp16)
# Wrap forward pass with autocast
with autocast(enabled=use_fp16):
logits = model(g, features)
loss = F.cross_entropy(logits[train_mask], labels[train_mask])
if use_fp16: if use_fp16:
# Backprop w/ gradient scaling # Backprop w/ gradient scaling
scaler.scale(loss).backward() backward(scaler, loss, optimizer)
scaler.step(optimizer)
scaler.update()
else: else:
loss.backward() loss.backward()
optimizer.step() optimizer.step()
print('Epoch {} | Loss {}'.format(epoch, loss.item())) print('Epoch {} | Loss {}'.format(epoch, loss.item()))
On a NVIDIA V100 (16GB) machine, training this model without fp16 consumes On a NVIDIA V100 (16GB) machine, training this model without fp16 consumes
15.2GB GPU memory; with fp16 turned on, the training consumes 12.8G 15.2GB GPU memory; with fp16 turned on, the training consumes 12.8G
GPU memory, the loss converges to similar values in both settings. GPU memory, the loss converges to similar values in both settings.
......
...@@ -249,7 +249,7 @@ To quickly locate the examples of your interest, search for the tagged keywords ...@@ -249,7 +249,7 @@ To quickly locate the examples of your interest, search for the tagged keywords
- Tags: matrix completion, recommender system, link prediction, bipartite graphs - Tags: matrix completion, recommender system, link prediction, bipartite graphs
- <a name="graphsage"></a> Hamilton et al. Inductive Representation Learning on Large Graphs. [Paper link](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf). - <a name="graphsage"></a> Hamilton et al. Inductive Representation Learning on Large Graphs. [Paper link](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
- Example code: [PyTorch](../examples/pytorch/graphsage), [PyTorch on ogbn-products](../examples/pytorch/ogb/ogbn-products), [PyTorch on ogbl-ppa](https://github.com/awslabs/dgl-lifesci/tree/master/examples/link_prediction/ogbl-ppa), [MXNet](../examples/mxnet/graphsage) - Example code: [PyTorch](../examples/pytorch/graphsage), [PyTorch on ogbn-products](../examples/pytorch/ogb/ogbn-products), [PyTorch on ogbn-mag](../examples/pytorch/ogb/ogbn-mag), [PyTorch on ogbl-ppa](https://github.com/awslabs/dgl-lifesci/tree/master/examples/link_prediction/ogbl-ppa), [MXNet](../examples/mxnet/graphsage)
- Tags: node classification, sampling, unsupervised learning, link prediction, OGB - Tags: node classification, sampling, unsupervised learning, link prediction, OGB
- <a name="metapath2vec"></a> Dong et al. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. [Paper link](https://dl.acm.org/doi/10.1145/3097983.3098036). - <a name="metapath2vec"></a> Dong et al. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. [Paper link](https://dl.acm.org/doi/10.1145/3097983.3098036).
......
...@@ -64,7 +64,7 @@ class ARMAConv(nn.Module): ...@@ -64,7 +64,7 @@ class ARMAConv(nn.Module):
# assume that the graphs are undirected and graph.in_degrees() is the same as graph.out_degrees() # assume that the graphs are undirected and graph.in_degrees() is the same as graph.out_degrees()
degs = g.in_degrees().float().clamp(min=1) degs = g.in_degrees().float().clamp(min=1)
norm = torch.pow(degs, -0.5).to(feats.device).unsqueeze(1) norm = torch.pow(degs, -0.5).to(feats.device).unsqueeze(1)
output = None output = []
for k in range(self.K): for k in range(self.K):
feats = init_feats feats = init_feats
...@@ -88,13 +88,9 @@ class ARMAConv(nn.Module): ...@@ -88,13 +88,9 @@ class ARMAConv(nn.Module):
if self.activation is not None: if self.activation is not None:
feats = self.activation(feats) feats = self.activation(feats)
output.append(feats)
if output is None: return torch.stack(output).mean(dim=0)
output = feats
else:
output += feats
return output / self.K
class ARMA4NC(nn.Module): class ARMA4NC(nn.Module):
def __init__(self, def __init__(self,
......
...@@ -92,9 +92,10 @@ def main(args): ...@@ -92,9 +92,10 @@ def main(args):
graph.ndata['nd'] = th.tanh(model.layers[i].MLP(layers_feat[i])) graph.ndata['nd'] = th.tanh(model.layers[i].MLP(layers_feat[i]))
for etype in graph.canonical_etypes: for etype in graph.canonical_etypes:
graph.apply_edges(_l1_dist, etype=etype) graph.apply_edges(_l1_dist, etype=etype)
dist[etype] = graph.edges[etype].data['ed'] dist[etype] = graph.edges[etype].data.pop('ed').detach().cpu()
dists.append(dist) dists.append(dist)
p.append(model.layers[i].p) p.append(model.layers[i].p)
graph.ndata.pop('nd')
sampler = CARESampler(p, dists, args.num_layers) sampler = CARESampler(p, dists, args.num_layers)
# train # train
...@@ -103,14 +104,9 @@ def main(args): ...@@ -103,14 +104,9 @@ def main(args):
tr_recall = 0 tr_recall = 0
tr_auc = 0 tr_auc = 0
tr_blk = 0 tr_blk = 0
train_dataloader = dgl.dataloading.DataLoader(graph, train_dataloader = dgl.dataloading.DataLoader(
train_idx, graph, train_idx, sampler, batch_size=args.batch_size,
sampler, shuffle=True, drop_last=False, num_workers=args.num_workers)
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
for input_nodes, output_nodes, blocks in train_dataloader: for input_nodes, output_nodes, blocks in train_dataloader:
blocks = [b.to(device) for b in blocks] blocks = [b.to(device) for b in blocks]
...@@ -135,14 +131,9 @@ def main(args): ...@@ -135,14 +131,9 @@ def main(args):
# validation # validation
model.eval() model.eval()
val_dataloader = dgl.dataloading.DataLoader(graph, val_dataloader = dgl.dataloading.DataLoader(
val_idx, graph, val_idx, sampler, batch_size=args.batch_size,
sampler, shuffle=True, drop_last=False, num_workers=args.num_workers)
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
val_recall, val_auc, val_loss = evaluate(model, loss_fn, val_dataloader, device) val_recall, val_auc, val_loss = evaluate(model, loss_fn, val_dataloader, device)
...@@ -159,14 +150,9 @@ def main(args): ...@@ -159,14 +150,9 @@ def main(args):
model.eval() model.eval()
if args.early_stop: if args.early_stop:
model.load_state_dict(th.load('es_checkpoint.pt')) model.load_state_dict(th.load('es_checkpoint.pt'))
test_dataloader = dgl.dataloading.DataLoader(graph, test_dataloader = dgl.dataloading.DataLoader(
test_idx, graph, test_idx, sampler, batch_size=args.batch_size,
sampler, shuffle=True, drop_last=False, num_workers=args.num_workers)
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
test_recall, test_auc, test_loss = evaluate(model, loss_fn, test_dataloader, device) test_recall, test_auc, test_loss = evaluate(model, loss_fn, test_dataloader, device)
......
...@@ -13,9 +13,10 @@ def _l1_dist(edges): ...@@ -13,9 +13,10 @@ def _l1_dist(edges):
class CARESampler(dgl.dataloading.BlockSampler): class CARESampler(dgl.dataloading.BlockSampler):
def __init__(self, p, dists, num_layers): def __init__(self, p, dists, num_layers):
super().__init__(num_layers) super().__init__()
self.p = p self.p = p
self.dists = dists self.dists = dists
self.num_layers = num_layers
def sample_frontier(self, block_id, g, seed_nodes, *args, **kwargs): def sample_frontier(self, block_id, g, seed_nodes, *args, **kwargs):
with g.local_scope(): with g.local_scope():
...@@ -28,7 +29,7 @@ class CARESampler(dgl.dataloading.BlockSampler): ...@@ -28,7 +29,7 @@ class CARESampler(dgl.dataloading.BlockSampler):
num_neigh = th.ceil(g.in_degrees(node, etype=etype) * self.p[block_id][etype]).int().item() num_neigh = th.ceil(g.in_degrees(node, etype=etype) * self.p[block_id][etype]).int().item()
neigh_dist = self.dists[block_id][etype][edges] neigh_dist = self.dists[block_id][etype][edges]
if neigh_dist.shape[0] > num_neigh: if neigh_dist.shape[0] > num_neigh:
neigh_index = np.argpartition(neigh_dist.cpu().detach(), num_neigh)[:num_neigh] neigh_index = np.argpartition(neigh_dist, num_neigh)[:num_neigh]
else: else:
neigh_index = np.arange(num_neigh) neigh_index = np.arange(num_neigh)
edge_mask[edges[neigh_index]] = 1 edge_mask[edges[neigh_index]] = 1
...@@ -36,6 +37,19 @@ class CARESampler(dgl.dataloading.BlockSampler): ...@@ -36,6 +37,19 @@ class CARESampler(dgl.dataloading.BlockSampler):
return dgl.edge_subgraph(g, new_edges_masks, relabel_nodes=False) return dgl.edge_subgraph(g, new_edges_masks, relabel_nodes=False)
def sample_blocks(self, g, seed_nodes, exclude_eids=None):
output_nodes = seed_nodes
blocks = []
for block_id in reversed(range(self.num_layers)):
frontier = self.sample_frontier(block_id, g, seed_nodes)
eid = frontier.edata[dgl.EID]
block = dgl.to_block(frontier, seed_nodes)
block.edata[dgl.EID] = eid
seed_nodes = block.srcdata[dgl.NID]
blocks.insert(0, block)
return seed_nodes, output_nodes, blocks
def __len__(self): def __len__(self):
return self.num_layers return self.num_layers
......
...@@ -17,7 +17,9 @@ class BesselBasisLayer(nn.Module): ...@@ -17,7 +17,9 @@ class BesselBasisLayer(nn.Module):
self.reset_params() self.reset_params()
def reset_params(self): def reset_params(self):
with torch.no_grad():
torch.arange(1, self.frequencies.numel() + 1, out=self.frequencies).mul_(np.pi) torch.arange(1, self.frequencies.numel() + 1, out=self.frequencies).mul_(np.pi)
self.frequencies.requires_grad_()
def forward(self, g): def forward(self, g):
d_scaled = g.edata['d'] / self.cutoff d_scaled = g.edata['d'] / self.cutoff
......
# DGL & Pytorch implementation of Enhanced Graph Embedding with Side information (EGES) # DGL & Pytorch implementation of Enhanced Graph Embedding with Side information (EGES)
Paper link: https://arxiv.org/pdf/1803.02349.pdf
## Version Reference code repo: (https://github.com/wangzhegeek/EGES.git)
dgl==0.6.1, torch==1.9.0
## Paper
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba:
https://arxiv.org/pdf/1803.02349.pdf
https://arxiv.org/abs/1803.02349
## How to run ## How to run
Create folder named `data`. Download two csv files from [here](https://github.com/Wang-Yu-Qing/dgl_data/tree/master/eges_data) into the `data` folder.
Run command: `python main.py` with default configuration, and the following message will shown up: - Create a folder named `data`.
`mkdir data`
- Download csv data
`wget https://raw.githubusercontent.com/Wang-Yu-Qing/dgl_data/master/eges_data/action_head.csv -P data/`
`wget https://raw.githubusercontent.com/Wang-Yu-Qing/dgl_data/master/eges_data/jdata_product.csv -P data/`
- Run with the following command (with default configuration)
`python main.py`
## Result
``` ```
Using backend: pytorch
Num skus: 33344, num brands: 3662, num shops: 4785, num cates: 79
Epoch 00000 | Step 00000 | Step Loss 0.9117 | Epoch Avg Loss: 0.9117
Epoch 00000 | Step 00100 | Step Loss 0.8736 | Epoch Avg Loss: 0.8801
Epoch 00000 | Step 00200 | Step Loss 0.8975 | Epoch Avg Loss: 0.8785
Evaluate link prediction AUC: 0.6864
Epoch 00001 | Step 00000 | Step Loss 0.8695 | Epoch Avg Loss: 0.8695
Epoch 00001 | Step 00100 | Step Loss 0.8290 | Epoch Avg Loss: 0.8643
Epoch 00001 | Step 00200 | Step Loss 0.8012 | Epoch Avg Loss: 0.8604
Evaluate link prediction AUC: 0.6875
...
Epoch 00029 | Step 00000 | Step Loss 0.7095 | Epoch Avg Loss: 0.7095
Epoch 00029 | Step 00100 | Step Loss 0.7248 | Epoch Avg Loss: 0.7139
Epoch 00029 | Step 00200 | Step Loss 0.7123 | Epoch Avg Loss: 0.7134
Evaluate link prediction AUC: 0.7084 Evaluate link prediction AUC: 0.7084
``` ```
The AUC of link-prediction task on test graph is computed after each epoch is done.
## Reference
https://github.com/nonva/eges
https://github.com/wangzhegeek/EGES.git
...@@ -2,54 +2,29 @@ Graph Attention Networks (GAT) ...@@ -2,54 +2,29 @@ Graph Attention Networks (GAT)
============ ============
- Paper link: [https://arxiv.org/abs/1710.10903](https://arxiv.org/abs/1710.10903) - Paper link: [https://arxiv.org/abs/1710.10903](https://arxiv.org/abs/1710.10903)
- Author's code repo (in Tensorflow): - Author's code repo (tensorflow implementation):
[https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT). [https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT).
- Popular pytorch implementation: - Popular pytorch implementation:
[https://github.com/Diego999/pyGAT](https://github.com/Diego999/pyGAT). [https://github.com/Diego999/pyGAT](https://github.com/Diego999/pyGAT).
Dependencies
------------
- torch v1.0: the autograd support for sparse mm is only available in v1.0.
- requests
- sklearn
```bash
pip install torch==1.0.0 requests
```
How to run How to run
---------- -------
Run with following:
```bash
python3 train.py --dataset=cora --gpu=0
```
Run with the following for multiclass node classification (available datasets: "cora", "citeseer", "pubmed")
```bash ```bash
python3 train.py --dataset=citeseer --gpu=0 --early-stop python3 train.py --dataset cora
``` ```
Run with the following for multilabel classification with PPI dataset
```bash ```bash
python3 train.py --dataset=pubmed --gpu=0 --num-out-heads=8 --weight-decay=0.001 --early-stop python3 train_ppi.py
``` ```
```bash > **_NOTE:_** Users may occasionally run into low accuracy issue (e.g., test accuracy < 0.8) due to overfitting. This can be resolved by adding Early Stopping or reducing maximum number of training epochs.
python3 train_ppi.py --gpu=0
```
Results Summary
------- -------
* cora: ~0.821
| Dataset | Test Accuracy | Time(s) | Baseline#1 times(s) | Baseline#2 times(s) | * citeseer: ~0.710
| -------- | ------------- | ------- | ------------------- | ------------------- | * pubmed: ~0.780
| Cora | 84.02(0.40) | 0.0113 | 0.0982 (**8.7x**) | 0.0424 (**3.8x**) | * ppi: ~0.9744
| Citeseer | 70.91(0.79) | 0.0111 | n/a | n/a |
| Pubmed | 78.57(0.75) | 0.0115 | n/a | n/a |
| PPI | 0.9836 | n/a | n/a | n/a |
* All the accuracy numbers are obtained after 300 epochs.
* The time measures how long it takes to train one epoch.
* All time is measured on EC2 p3.2xlarge instance w/ V100 GPU.
* Baseline#1: [https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT).
* Baseline#2: [https://github.com/Diego999/pyGAT](https://github.com/Diego999/pyGAT).
"""
Graph Attention Networks in DGL using SPMV optimization.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import torch
import torch.nn as nn
import dgl.function as fn
from dgl.nn import GATConv
class GAT(nn.Module):
def __init__(self,
g,
num_layers,
in_dim,
num_hidden,
num_classes,
heads,
activation,
feat_drop,
attn_drop,
negative_slope,
residual):
super(GAT, self).__init__()
self.g = g
self.num_layers = num_layers
self.gat_layers = nn.ModuleList()
self.activation = activation
if num_layers > 1:
# input projection (no residual)
self.gat_layers.append(GATConv(
in_dim, num_hidden, heads[0],
feat_drop, attn_drop, negative_slope, False, self.activation))
# hidden layers
for l in range(1, num_layers-1):
# due to multi-head, the in_dim = num_hidden * num_heads
self.gat_layers.append(GATConv(
num_hidden * heads[l-1], num_hidden, heads[l],
feat_drop, attn_drop, negative_slope, residual, self.activation))
# output projection
self.gat_layers.append(GATConv(
num_hidden * heads[-2], num_classes, heads[-1],
feat_drop, attn_drop, negative_slope, residual, None))
else:
self.gat_layers.append(GATConv(
in_dim, num_classes, heads[0],
feat_drop, attn_drop, negative_slope, residual, None))
def forward(self, inputs):
h = inputs
for l in range(self.num_layers):
h = self.gat_layers[l](self.g, h)
h = h.flatten(1) if l != self.num_layers - 1 else h.mean(1)
return h
"""
Graph Attention Networks in DGL using SPMV optimization.
Multiple heads are also batched together for faster training.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import argparse
import numpy as np
import networkx as nx
import time
import torch import torch
import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
import dgl import dgl.nn as dglnn
from dgl.data import register_data_args
from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
from dgl import AddSelfLoop
import argparse
from gat import GAT class GAT(nn.Module):
from utils import EarlyStopping def __init__(self,in_size, hid_size, out_size, heads):
super().__init__()
self.gat_layers = nn.ModuleList()
def accuracy(logits, labels): # two-layer GAT
self.gat_layers.append(dglnn.GATConv(in_size, hid_size, heads[0], feat_drop=0.6, attn_drop=0.6, activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[0], out_size, heads[1], feat_drop=0.6, attn_drop=0.6, activation=None))
def forward(self, g, inputs):
h = inputs
for i, layer in enumerate(self.gat_layers):
h = layer(g, h)
if i == 1: # last layer
h = h.mean(1)
else: # other layer(s)
h = h.flatten(1)
return h
def evaluate(g, features, labels, mask, model):
model.eval()
with torch.no_grad():
logits = model(g, features)
logits = logits[mask]
labels = labels[mask]
_, indices = torch.max(logits, dim=1) _, indices = torch.max(logits, dim=1)
correct = torch.sum(indices == labels) correct = torch.sum(indices == labels)
return correct.item() * 1.0 / len(labels) return correct.item() * 1.0 / len(labels)
def train(g, features, labels, masks, model):
# define train/val samples, loss function and optimizer
train_mask = masks[0]
val_mask = masks[1]
loss_fcn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=5e-3, weight_decay=5e-4)
def evaluate(model, features, labels, mask): #training loop
model.eval() for epoch in range(200):
with torch.no_grad(): model.train()
logits = model(features) logits = model(g, features)
logits = logits[mask] loss = loss_fcn(logits[train_mask], labels[train_mask])
labels = labels[mask] optimizer.zero_grad()
return accuracy(logits, labels) loss.backward()
optimizer.step()
acc = evaluate(g, features, labels, val_mask, model)
print("Epoch {:05d} | Loss {:.4f} | Accuracy {:.4f} "
. format(epoch, loss.item(), acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--dataset", type=str, default="cora",
help="Dataset name ('cora', 'citeseer', 'pubmed').")
args = parser.parse_args()
print(f'Training with DGL built-in GATConv module.')
def main(args):
# load and preprocess dataset # load and preprocess dataset
transform = AddSelfLoop() # by default, it will first remove self-loops to prevent duplication
if args.dataset == 'cora': if args.dataset == 'cora':
data = CoraGraphDataset() data = CoraGraphDataset(transform=transform)
elif args.dataset == 'citeseer': elif args.dataset == 'citeseer':
data = CiteseerGraphDataset() data = CiteseerGraphDataset(transform=transform)
elif args.dataset == 'pubmed': elif args.dataset == 'pubmed':
data = PubmedGraphDataset() data = PubmedGraphDataset(transform=transform)
else: else:
raise ValueError('Unknown dataset: {}'.format(args.dataset)) raise ValueError('Unknown dataset: {}'.format(args.dataset))
g = data[0] g = data[0]
if args.gpu < 0: device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
cuda = False g = g.int().to(device)
else:
cuda = True
g = g.int().to(args.gpu)
features = g.ndata['feat'] features = g.ndata['feat']
labels = g.ndata['label'] labels = g.ndata['label']
train_mask = g.ndata['train_mask'] masks = g.ndata['train_mask'], g.ndata['val_mask'], g.ndata['test_mask']
val_mask = g.ndata['val_mask']
test_mask = g.ndata['test_mask']
num_feats = features.shape[1]
n_classes = data.num_labels
n_edges = g.number_of_edges()
print("""----Data statistics------'
#Edges %d
#Classes %d
#Train samples %d
#Val samples %d
#Test samples %d""" %
(n_edges, n_classes,
train_mask.int().sum().item(),
val_mask.int().sum().item(),
test_mask.int().sum().item()))
# add self loop
g = dgl.remove_self_loop(g)
g = dgl.add_self_loop(g)
n_edges = g.number_of_edges()
# create model
heads = ([args.num_heads] * (args.num_layers-1)) + [args.num_out_heads]
model = GAT(g,
args.num_layers,
num_feats,
args.num_hidden,
n_classes,
heads,
F.elu,
args.in_drop,
args.attn_drop,
args.negative_slope,
args.residual)
print(model)
if args.early_stop:
stopper = EarlyStopping(patience=100)
if cuda:
model.cuda()
loss_fcn = torch.nn.CrossEntropyLoss()
# use optimizer
optimizer = torch.optim.Adam(
model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
# initialize graph
dur = []
for epoch in range(args.epochs):
model.train()
if epoch >= 3:
if cuda:
torch.cuda.synchronize()
t0 = time.time()
# forward
logits = model(features)
loss = loss_fcn(logits[train_mask], labels[train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch >= 3:
if cuda:
torch.cuda.synchronize()
dur.append(time.time() - t0)
train_acc = accuracy(logits[train_mask], labels[train_mask])
if args.fastmode:
val_acc = accuracy(logits[val_mask], labels[val_mask])
else:
val_acc = evaluate(model, features, labels, val_mask)
if args.early_stop:
if stopper.step(val_acc, model):
break
print("Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | TrainAcc {:.4f} |" # create GAT model
" ValAcc {:.4f} | ETputs(KTEPS) {:.2f}". in_size = features.shape[1]
format(epoch, np.mean(dur), loss.item(), train_acc, out_size = data.num_classes
val_acc, n_edges / np.mean(dur) / 1000)) model = GAT(in_size, 8, out_size, heads=[8,1]).to(device)
print() # model training
if args.early_stop: print('Training...')
model.load_state_dict(torch.load('es_checkpoint.pt')) train(g, features, labels, masks, model)
acc = evaluate(model, features, labels, test_mask)
print("Test Accuracy {:.4f}".format(acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GAT')
register_data_args(parser)
parser.add_argument("--gpu", type=int, default=-1,
help="which GPU to use. Set -1 to use CPU.")
parser.add_argument("--epochs", type=int, default=200,
help="number of training epochs")
parser.add_argument("--num-heads", type=int, default=8,
help="number of hidden attention heads")
parser.add_argument("--num-out-heads", type=int, default=1,
help="number of output attention heads")
parser.add_argument("--num-layers", type=int, default=2,
help="number of hidden layers")
parser.add_argument("--num-hidden", type=int, default=8,
help="number of hidden units")
parser.add_argument("--residual", action="store_true", default=False,
help="use residual connection")
parser.add_argument("--in-drop", type=float, default=.6,
help="input feature dropout")
parser.add_argument("--attn-drop", type=float, default=.6,
help="attention dropout")
parser.add_argument("--lr", type=float, default=0.005,
help="learning rate")
parser.add_argument('--weight-decay', type=float, default=5e-4,
help="weight decay")
parser.add_argument('--negative-slope', type=float, default=0.2,
help="the negative slope of leaky relu")
parser.add_argument('--early-stop', action='store_true', default=False,
help="indicates whether to use early stop or not")
parser.add_argument('--fastmode', action="store_true", default=False,
help="skip re-evaluate the validation set")
args = parser.parse_args()
print(args)
main(args) # test the model
print('Testing...')
acc = evaluate(g, features, labels, masks[2], model)
print("Test accuracy {:.4f}".format(acc))
"""
Graph Attention Networks (PPI Dataset) in DGL using SPMV optimization.
Multiple heads are also batched together for faster training.
Compared with the original paper, this code implements
early stopping.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import numpy as np import numpy as np
import torch import torch
import dgl import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
import argparse import dgl.nn as dglnn
from sklearn.metrics import f1_score
from gat import GAT
from dgl.data.ppi import PPIDataset from dgl.data.ppi import PPIDataset
from dgl.dataloading import GraphDataLoader from dgl.dataloading import GraphDataLoader
from sklearn.metrics import f1_score
def evaluate(feats, model, subgraph, labels, loss_fcn): class GAT(nn.Module):
with torch.no_grad(): def __init__(self, in_size, hid_size, out_size, heads):
super().__init__()
self.gat_layers = nn.ModuleList()
# three-layer GAT
self.gat_layers.append(dglnn.GATConv(in_size, hid_size, heads[0], activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[0], hid_size, heads[1], residual=True, activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[1], out_size, heads[2], residual=True, activation=None))
def forward(self, g, inputs):
h = inputs
for i, layer in enumerate(self.gat_layers):
h = layer(g, h)
if i == 2: # last layer
h = h.mean(1)
else: # other layer(s)
h = h.flatten(1)
return h
def evaluate(g, features, labels, model):
model.eval() model.eval()
model.g = subgraph with torch.no_grad():
for layer in model.gat_layers: output = model(g, features)
layer.g = subgraph pred = np.where(output.data.cpu().numpy() >= 0, 1, 0)
output = model(feats.float()) score = f1_score(labels.data.cpu().numpy(), pred, average='micro')
loss_data = loss_fcn(output, labels.float()) return score
predict = np.where(output.data.cpu().numpy() >= 0., 1, 0)
score = f1_score(labels.data.cpu().numpy(),
predict, average='micro')
return score, loss_data.item()
def main(args): def evaluate_in_batches(dataloader, device, model):
if args.gpu<0: total_score = 0
device = torch.device("cpu") for batch_id, batched_graph in enumerate(dataloader):
else: batched_graph = batched_graph.to(device)
device = torch.device("cuda:" + str(args.gpu)) features = batched_graph.ndata['feat']
labels = batched_graph.ndata['label']
score = evaluate(batched_graph, features, labels, model)
total_score += score
return total_score / (batch_id + 1) # return average score
batch_size = args.batch_size def train(train_dataloader, val_dataloader, device, model):
cur_step = 0 # define loss function and optimizer
patience = args.patience loss_fcn = nn.BCEWithLogitsLoss()
best_score = -1 optimizer = torch.optim.Adam(model.parameters(), lr=5e-3, weight_decay=0)
best_loss = 10000
# define loss function # training loop
loss_fcn = torch.nn.BCEWithLogitsLoss() for epoch in range(400):
# create the dataset
train_dataset = PPIDataset(mode='train')
valid_dataset = PPIDataset(mode='valid')
test_dataset = PPIDataset(mode='test')
train_dataloader = GraphDataLoader(train_dataset, batch_size=batch_size)
valid_dataloader = GraphDataLoader(valid_dataset, batch_size=batch_size)
test_dataloader = GraphDataLoader(test_dataset, batch_size=batch_size)
g = train_dataset[0]
n_classes = train_dataset.num_labels
num_feats = g.ndata['feat'].shape[1]
g = g.int().to(device)
heads = ([args.num_heads] * (args.num_layers-1)) + [args.num_out_heads]
# define the model
model = GAT(g,
args.num_layers,
num_feats,
args.num_hidden,
n_classes,
heads,
F.elu,
args.in_drop,
args.attn_drop,
args.alpha,
args.residual)
# define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
model = model.to(device)
for epoch in range(args.epochs):
model.train() model.train()
loss_list = [] logits = []
for batch, subgraph in enumerate(train_dataloader): total_loss = 0
subgraph = subgraph.to(device) # mini-batch loop
model.g = subgraph for batch_id, batched_graph in enumerate(train_dataloader):
for layer in model.gat_layers: batched_graph = batched_graph.to(device)
layer.g = subgraph features = batched_graph.ndata['feat'].float()
logits = model(subgraph.ndata['feat'].float()) labels = batched_graph.ndata['label'].float()
loss = loss_fcn(logits, subgraph.ndata['label']) logits = model(batched_graph, features)
loss = loss_fcn(logits, labels)
optimizer.zero_grad() optimizer.zero_grad()
loss.backward() loss.backward()
optimizer.step() optimizer.step()
loss_list.append(loss.item()) total_loss += loss.item()
loss_data = np.array(loss_list).mean() print("Epoch {:05d} | Loss {:.4f} |". format(epoch, total_loss / (batch_id + 1) ))
print("Epoch {:05d} | Loss: {:.4f}".format(epoch + 1, loss_data))
if epoch % 5 == 0: if (epoch + 1) % 5 == 0:
score_list = [] avg_score = evaluate_in_batches(val_dataloader, device, model) # evaluate F1-score instead of loss
val_loss_list = [] print(" Acc. (F1-score) {:.4f} ". format(avg_score))
for batch, subgraph in enumerate(valid_dataloader):
subgraph = subgraph.to(device)
score, val_loss = evaluate(subgraph.ndata['feat'], model, subgraph, subgraph.ndata['label'], loss_fcn)
score_list.append(score)
val_loss_list.append(val_loss)
mean_score = np.array(score_list).mean()
mean_val_loss = np.array(val_loss_list).mean()
print("Val F1-Score: {:.4f} ".format(mean_score))
# early stop
if mean_score > best_score or best_loss > mean_val_loss:
if mean_score > best_score and best_loss > mean_val_loss:
val_early_loss = mean_val_loss
val_early_score = mean_score
best_score = np.max((mean_score, best_score))
best_loss = np.min((best_loss, mean_val_loss))
cur_step = 0
else:
cur_step += 1
if cur_step == patience:
break
test_score_list = []
for batch, subgraph in enumerate(test_dataloader):
subgraph = subgraph.to(device)
score, test_loss = evaluate(subgraph.ndata['feat'], model, subgraph, subgraph.ndata['label'], loss_fcn)
test_score_list.append(score)
print("Test F1-Score: {:.4f}".format(np.array(test_score_list).mean()))
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GAT') print(f'Training PPI Dataset with DGL built-in GATConv module.')
parser.add_argument("--gpu", type=int, default=-1, device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
help="which GPU to use. Set -1 to use CPU.")
parser.add_argument("--epochs", type=int, default=400, # load and preprocess datasets
help="number of training epochs") train_dataset = PPIDataset(mode='train')
parser.add_argument("--num-heads", type=int, default=4, val_dataset = PPIDataset(mode='valid')
help="number of hidden attention heads") test_dataset = PPIDataset(mode='test')
parser.add_argument("--num-out-heads", type=int, default=6, features = train_dataset[0].ndata['feat']
help="number of output attention heads")
parser.add_argument("--num-layers", type=int, default=3, # create GAT model
help="number of hidden layers") in_size = features.shape[1]
parser.add_argument("--num-hidden", type=int, default=256, out_size = train_dataset.num_labels
help="number of hidden units") model = GAT(in_size, 256, out_size, heads=[4,4,6]).to(device)
parser.add_argument("--residual", action="store_true", default=True,
help="use residual connection") # model training
parser.add_argument("--in-drop", type=float, default=0, print('Training...')
help="input feature dropout") train_dataloader = GraphDataLoader(train_dataset, batch_size=2)
parser.add_argument("--attn-drop", type=float, default=0, val_dataloader = GraphDataLoader(val_dataset, batch_size=2)
help="attention dropout") train(train_dataloader, val_dataloader, device, model)
parser.add_argument("--lr", type=float, default=0.005,
help="learning rate")
parser.add_argument('--weight-decay', type=float, default=0,
help="weight decay")
parser.add_argument('--alpha', type=float, default=0.2,
help="the negative slop of leaky relu")
parser.add_argument('--batch-size', type=int, default=2,
help="batch size used for training, validation and test")
parser.add_argument('--patience', type=int, default=10,
help="used for early stop")
args = parser.parse_args()
print(args)
main(args) # test the model
print('Testing...')
test_dataloader = GraphDataLoader(test_dataset, batch_size=2)
avg_score = evaluate_in_batches(test_dataloader, device, model)
print("Test Accuracy (F1-score) {:.4f}".format(avg_score))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment