"vscode:/vscode.git/clone" did not exist on "78ecd5086b9aafb7a2f424e001208e1357296af3"
Unverified Commit e9b624fe authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

Merge branch 'master' into dist_part

parents 8086d1ed a88e7f7e
# Accuracy across 10 runs: 0.7788 ± 0.002227
version: 0.0.1
version: 0.0.2
pipeline_name: nodepred
pipeline_mode: train
device: cuda:0
......
# Accuracy across 10 runs: 0.7826 ± 0.004317
version: 0.0.1
version: 0.0.2
pipeline_name: nodepred
pipeline_mode: train
device: cuda:0
......
# Accuracy across 10 runs: 0.7819 ± 0.003176
version: 0.0.1
version: 0.0.2
pipeline_name: nodepred
pipeline_mode: train
device: cuda:0
......
......@@ -4,7 +4,7 @@ from setuptools import find_packages
from distutils.core import setup
setup(name='dglgo',
version='0.0.1',
version='0.0.2',
description='DGL',
author='DGL Team',
author_email='wmjlyjemaine@gmail.com',
......
version: 0.0.1
version: 0.0.2
pipeline_name: nodepred
pipeline_mode: train
device: cpu
......
......@@ -132,6 +132,7 @@ under the ``dgl`` namespace.
DGLGraph.add_self_loop
DGLGraph.remove_self_loop
DGLGraph.to_simple
DGLGraph.to_cugraph
DGLGraph.reorder_graph
Adjacency and incidence matrix
......
......@@ -18,6 +18,7 @@ Operators for constructing :class:`DGLGraph` from raw data formats.
graph
heterograph
from_cugraph
from_scipy
from_networkx
bipartite_from_scipy
......@@ -93,6 +94,7 @@ Operators for generating new graphs by manipulating the structure of the existin
to_bidirected
to_bidirected_stale
to_block
to_cugraph
to_double
to_float
to_half
......
......@@ -24,11 +24,13 @@ API that is exposed to python is only a few lines of codes:
#include <dgl/runtime/packed_func.h>
#include <dgl/runtime/registry.h>
using namespace dgl::runtime;
DGL_REGISTER_GLOBAL("calculator.MyAdd")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
int a = args[0];
int b = args[1];
*rv = a * b;
*rv = a + b;
});
Compile and build the library. On the python side, create a
......
......@@ -60,7 +60,7 @@ Using CUDA UVA-based neighborhood sampling in DGL data loaders
For the case where the graph is too large to fit onto the GPU memory, we introduce the
CUDA UVA (Unified Virtual Addressing)-based sampling, in which GPUs perform the sampling
on the graph pinned on CPU memory via zero-copy access.
on the graph pinned in CPU memory via zero-copy access.
You can enable UVA-based neighborhood sampling in DGL data loaders via:
* Put the ``train_nid`` onto GPU.
......@@ -99,6 +99,38 @@ especially for multi-GPU training.
Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details.
UVA and GPU support for PinSAGESampler/RandomWalkNeighborSampler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PinSAGESampler and RandomWalkNeighborSampler support UVA and GPU sampling.
You can enable them via:
* Pin the graph (for UVA sampling) or put the graph onto GPU (for GPU sampling).
* Put the ``train_nid`` onto GPU.
.. code:: python
g = dgl.heterograph({
('item', 'bought-by', 'user'): ([0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 2, 3, 2, 3]),
('user', 'bought', 'item'): ([0, 1, 0, 1, 2, 3, 2, 3], [0, 0, 1, 1, 2, 2, 3, 3])})
# UVA setup
# g.create_formats_()
# g.pin_memory_()
# GPU setup
device = torch.device('cuda:0')
g = g.to(device)
sampler1 = dgl.sampling.PinSAGESampler(g, 'item', 'user', 4, 0.5, 3, 2)
sampler2 = dgl.sampling.RandomWalkNeighborSampler(g, 4, 0.5, 3, 2, ['bought-by', 'bought'])
train_nid = torch.tensor([0, 2], dtype=g.idtype, device=device)
sampler1(train_nid)
sampler2(train_nid)
Using GPU-based neighbor sampling with DGL functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -106,8 +138,7 @@ You can build your own GPU sampling pipelines with the following functions that
operating on GPU:
* :func:`dgl.sampling.sample_neighbors`
* Only has support for uniform sampling; non-uniform sampling can only run on CPU.
* :func:`dgl.sampling.random_walk`
Subgraph extraction ops:
......
......@@ -2,59 +2,36 @@
Chapter 8: Mixed Precision Training
===================================
DGL is compatible with `PyTorch's automatic mixed precision package
DGL is compatible with the `PyTorch Automatic Mixed Precision (AMP) package
<https://pytorch.org/docs/stable/amp.html>`_
for mixed precision training, thus saving both training time and GPU memory
consumption. To enable this feature, users need to install PyTorch 1.6+ with python 3.7+ and
build DGL from source file to support ``float16`` data type (this feature is
still in its beta stage and we do not provide official pre-built pip wheels).
Installation
------------
First download DGL's source code from GitHub and build the shared library
with flag ``USE_FP16=ON``.
.. code:: bash
git clone --recurse-submodules https://github.com/dmlc/dgl.git
cd dgl
mkdir build
cd build
cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
make -j
Then install the Python binding.
.. code:: bash
cd ../python
python setup.py install
consumption. This feature requires DGL 0.9+.
Message-Passing with Half Precision
-----------------------------------
DGL with fp16 support allows message-passing on ``float16`` features for both
UDF(User Defined Function)s and built-in functions (e.g. ``dgl.function.sum``,
DGL allows message-passing on ``float16 (fp16)`` features for both
UDFs (User Defined Functions) and built-in functions (e.g., ``dgl.function.sum``,
``dgl.function.copy_u``).
The following examples shows how to use DGL's message-passing API on half-precision
The following example shows how to use DGL's message-passing APIs on half-precision
features:
>>> import torch
>>> import dgl
>>> import dgl.function as fn
>>> g = dgl.rand_graph(30, 100).to(0) # Create a graph on GPU w/ 30 nodes and 100 edges.
>>> g.ndata['h'] = torch.rand(30, 16).to(0).half() # Create fp16 node features.
>>> g.edata['w'] = torch.rand(100, 1).to(0).half() # Create fp16 edge features.
>>> dev = torch.device('cuda')
>>> g = dgl.rand_graph(30, 100).to(dev) # Create a graph on GPU w/ 30 nodes and 100 edges.
>>> g.ndata['h'] = torch.rand(30, 16).to(dev).half() # Create fp16 node features.
>>> g.edata['w'] = torch.rand(100, 1).to(dev).half() # Create fp16 edge features.
>>> # Use DGL's built-in functions for message passing on fp16 features.
>>> g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.sum('m', 'x'))
>>> g.ndata['x'][0]
tensor([0.3391, 0.2208, 0.7163, 0.6655, 0.7031, 0.5854, 0.9404, 0.7720, 0.6562,
0.4028, 0.6943, 0.5908, 0.9307, 0.5962, 0.7827, 0.5034],
device='cuda:0', dtype=torch.float16)
>>> g.ndata['x'].dtype
torch.float16
>>> g.apply_edges(fn.u_dot_v('h', 'x', 'hx'))
>>> g.edata['hx'][0]
tensor([5.4570], device='cuda:0', dtype=torch.float16)
>>> # Use UDF(User Defined Functions) for message passing on fp16 features.
>>> g.edata['hx'].dtype
torch.float16
>>> # Use UDFs for message passing on fp16 features.
>>> def message(edges):
... return {'m': edges.src['h'] * edges.data['w']}
...
......@@ -65,14 +42,11 @@ features:
... return {'hy': (edges.src['h'] * edges.dst['y']).sum(-1, keepdims=True)}
...
>>> g.update_all(message, reduce)
>>> g.ndata['y'][0]
tensor([0.3394, 0.2209, 0.7168, 0.6655, 0.7026, 0.5854, 0.9404, 0.7720, 0.6562,
0.4028, 0.6943, 0.5908, 0.9307, 0.5967, 0.7827, 0.5039],
device='cuda:0', dtype=torch.float16)
>>> g.ndata['y'].dtype
torch.float16
>>> g.apply_edges(dot)
>>> g.edata['hy'][0]
tensor([5.4609], device='cuda:0', dtype=torch.float16)
>>> g.edata['hy'].dtype
torch.float16
End-to-End Mixed Precision Training
-----------------------------------
......@@ -80,33 +54,52 @@ DGL relies on PyTorch's AMP package for mixed precision training,
and the user experience is exactly
the same as `PyTorch's <https://pytorch.org/docs/stable/notes/amp_examples.html>`_.
By wrapping the forward pass (including loss computation) of your GNN model with
``torch.cuda.amp.autocast()``, PyTorch automatically selects the appropriate datatype
for each op and tensor. Half precision tensors are memory efficient, most operators
on half precision tensors are faster as they leverage GPU's tensorcores.
By wrapping the forward pass with ``torch.cuda.amp.autocast()``, PyTorch automatically
selects the appropriate datatype for each op and tensor. Half precision tensors are memory
efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.
Small Gradients in ``float16`` format have underflow problems (flush to zero), and
PyTorch provides a ``GradScaler`` module to address this issue. ``GradScaler`` multiplies
loss by a factor and invokes backward pass on scaled loss, and unscales graidents before
optimizers update the parameters, thus preventing the underflow problem.
The scale factor is determined automatically.
.. code::
Following is the training script of 3-layer GAT on Reddit dataset (w/ 114 million edges),
note the difference in codes when ``use_fp16`` is activated/not activated:
import torch.nn.functional as F
from torch.cuda.amp import autocast
def forward(g, feat, label, mask, model, use_fp16):
with autocast(enabled=use_fp16):
logit = model(g, feat)
loss = F.cross_entropy(logit[mask], label[mask])
return loss
Small Gradients in ``float16`` format have underflow problems (flush to zero).
PyTorch provides a ``GradScaler`` module to address this issue. It multiplies
the loss by a factor and invokes backward pass on the scaled loss to prevent
the underflow problem. It then unscales the computed gradients before the optimizer
updates the parameters. The scale factor is determined automatically.
.. code::
import torch
from torch.cuda.amp import GradScaler
scaler = GradScaler()
def backward(scaler, loss, optimizer):
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
The following example trains a 3-layer GAT on the Reddit dataset (w/ 114 million edges).
Pay attention to the differences in the code when ``use_fp16`` is activated or not.
.. code::
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.cuda.amp import autocast, GradScaler
import dgl
from dgl.data import RedditDataset
from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop
use_fp16 = True
class GAT(nn.Module):
def __init__(self,
in_feats,
......@@ -129,48 +122,40 @@ note the difference in codes when ``use_fp16`` is activated/not activated:
return h
# Data loading
data = RedditDataset()
device = torch.device(0)
transform = AddSelfLoop()
data = RedditDataset(transform)
dev = torch.device('cuda')
g = data[0]
g = dgl.add_self_loop(g)
g = g.int().to(device)
g = g.int().to(dev)
train_mask = g.ndata['train_mask']
features = g.ndata['feat']
labels = g.ndata['label']
in_feats = features.shape[1]
feat = g.ndata['feat']
label = g.ndata['label']
in_feats = feat.shape[1]
n_hidden = 256
n_classes = data.num_classes
n_edges = g.number_of_edges()
heads = [1, 1, 1]
model = GAT(in_feats, n_hidden, n_classes, heads)
model = model.to(device)
model = model.to(dev)
model.train()
# Create optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)
# Create gradient scaler
scaler = GradScaler()
for epoch in range(100):
model.train()
optimizer.zero_grad()
loss = forward(g, feat, label, train_mask, model, use_fp16)
# Wrap forward pass with autocast
with autocast(enabled=use_fp16):
logits = model(g, features)
loss = F.cross_entropy(logits[train_mask], labels[train_mask])
if use_fp16:
# Backprop w/ gradient scaling
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
backward(scaler, loss, optimizer)
else:
loss.backward()
optimizer.step()
print('Epoch {} | Loss {}'.format(epoch, loss.item()))
On a NVIDIA V100 (16GB) machine, training this model without fp16 consumes
15.2GB GPU memory; with fp16 turned on, the training consumes 12.8G
GPU memory, the loss converges to similar values in both settings.
......
......@@ -249,7 +249,7 @@ To quickly locate the examples of your interest, search for the tagged keywords
- Tags: matrix completion, recommender system, link prediction, bipartite graphs
- <a name="graphsage"></a> Hamilton et al. Inductive Representation Learning on Large Graphs. [Paper link](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
- Example code: [PyTorch](../examples/pytorch/graphsage), [PyTorch on ogbn-products](../examples/pytorch/ogb/ogbn-products), [PyTorch on ogbl-ppa](https://github.com/awslabs/dgl-lifesci/tree/master/examples/link_prediction/ogbl-ppa), [MXNet](../examples/mxnet/graphsage)
- Example code: [PyTorch](../examples/pytorch/graphsage), [PyTorch on ogbn-products](../examples/pytorch/ogb/ogbn-products), [PyTorch on ogbn-mag](../examples/pytorch/ogb/ogbn-mag), [PyTorch on ogbl-ppa](https://github.com/awslabs/dgl-lifesci/tree/master/examples/link_prediction/ogbl-ppa), [MXNet](../examples/mxnet/graphsage)
- Tags: node classification, sampling, unsupervised learning, link prediction, OGB
- <a name="metapath2vec"></a> Dong et al. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. [Paper link](https://dl.acm.org/doi/10.1145/3097983.3098036).
......
......@@ -64,7 +64,7 @@ class ARMAConv(nn.Module):
# assume that the graphs are undirected and graph.in_degrees() is the same as graph.out_degrees()
degs = g.in_degrees().float().clamp(min=1)
norm = torch.pow(degs, -0.5).to(feats.device).unsqueeze(1)
output = None
output = []
for k in range(self.K):
feats = init_feats
......@@ -88,13 +88,9 @@ class ARMAConv(nn.Module):
if self.activation is not None:
feats = self.activation(feats)
if output is None:
output = feats
else:
output += feats
return output / self.K
output.append(feats)
return torch.stack(output).mean(dim=0)
class ARMA4NC(nn.Module):
def __init__(self,
......
......@@ -92,9 +92,10 @@ def main(args):
graph.ndata['nd'] = th.tanh(model.layers[i].MLP(layers_feat[i]))
for etype in graph.canonical_etypes:
graph.apply_edges(_l1_dist, etype=etype)
dist[etype] = graph.edges[etype].data['ed']
dist[etype] = graph.edges[etype].data.pop('ed').detach().cpu()
dists.append(dist)
p.append(model.layers[i].p)
graph.ndata.pop('nd')
sampler = CARESampler(p, dists, args.num_layers)
# train
......@@ -103,14 +104,9 @@ def main(args):
tr_recall = 0
tr_auc = 0
tr_blk = 0
train_dataloader = dgl.dataloading.DataLoader(graph,
train_idx,
sampler,
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
train_dataloader = dgl.dataloading.DataLoader(
graph, train_idx, sampler, batch_size=args.batch_size,
shuffle=True, drop_last=False, num_workers=args.num_workers)
for input_nodes, output_nodes, blocks in train_dataloader:
blocks = [b.to(device) for b in blocks]
......@@ -135,14 +131,9 @@ def main(args):
# validation
model.eval()
val_dataloader = dgl.dataloading.DataLoader(graph,
val_idx,
sampler,
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
val_dataloader = dgl.dataloading.DataLoader(
graph, val_idx, sampler, batch_size=args.batch_size,
shuffle=True, drop_last=False, num_workers=args.num_workers)
val_recall, val_auc, val_loss = evaluate(model, loss_fn, val_dataloader, device)
......@@ -159,14 +150,9 @@ def main(args):
model.eval()
if args.early_stop:
model.load_state_dict(th.load('es_checkpoint.pt'))
test_dataloader = dgl.dataloading.DataLoader(graph,
test_idx,
sampler,
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
num_workers=args.num_workers
)
test_dataloader = dgl.dataloading.DataLoader(
graph, test_idx, sampler, batch_size=args.batch_size,
shuffle=True, drop_last=False, num_workers=args.num_workers)
test_recall, test_auc, test_loss = evaluate(model, loss_fn, test_dataloader, device)
......
......@@ -13,9 +13,10 @@ def _l1_dist(edges):
class CARESampler(dgl.dataloading.BlockSampler):
def __init__(self, p, dists, num_layers):
super().__init__(num_layers)
super().__init__()
self.p = p
self.dists = dists
self.num_layers = num_layers
def sample_frontier(self, block_id, g, seed_nodes, *args, **kwargs):
with g.local_scope():
......@@ -28,7 +29,7 @@ class CARESampler(dgl.dataloading.BlockSampler):
num_neigh = th.ceil(g.in_degrees(node, etype=etype) * self.p[block_id][etype]).int().item()
neigh_dist = self.dists[block_id][etype][edges]
if neigh_dist.shape[0] > num_neigh:
neigh_index = np.argpartition(neigh_dist.cpu().detach(), num_neigh)[:num_neigh]
neigh_index = np.argpartition(neigh_dist, num_neigh)[:num_neigh]
else:
neigh_index = np.arange(num_neigh)
edge_mask[edges[neigh_index]] = 1
......@@ -36,6 +37,19 @@ class CARESampler(dgl.dataloading.BlockSampler):
return dgl.edge_subgraph(g, new_edges_masks, relabel_nodes=False)
def sample_blocks(self, g, seed_nodes, exclude_eids=None):
output_nodes = seed_nodes
blocks = []
for block_id in reversed(range(self.num_layers)):
frontier = self.sample_frontier(block_id, g, seed_nodes)
eid = frontier.edata[dgl.EID]
block = dgl.to_block(frontier, seed_nodes)
block.edata[dgl.EID] = eid
seed_nodes = block.srcdata[dgl.NID]
blocks.insert(0, block)
return seed_nodes, output_nodes, blocks
def __len__(self):
return self.num_layers
......
......@@ -17,7 +17,9 @@ class BesselBasisLayer(nn.Module):
self.reset_params()
def reset_params(self):
torch.arange(1, self.frequencies.numel() + 1, out=self.frequencies).mul_(np.pi)
with torch.no_grad():
torch.arange(1, self.frequencies.numel() + 1, out=self.frequencies).mul_(np.pi)
self.frequencies.requires_grad_()
def forward(self, g):
d_scaled = g.edata['d'] / self.cutoff
......@@ -25,4 +27,4 @@ class BesselBasisLayer(nn.Module):
d_scaled = torch.unsqueeze(d_scaled, -1)
d_cutoff = self.envelope(d_scaled)
g.edata['rbf'] = d_cutoff * torch.sin(self.frequencies * d_scaled)
return g
\ No newline at end of file
return g
# DGL & Pytorch implementation of Enhanced Graph Embedding with Side information (EGES)
## Version
dgl==0.6.1, torch==1.9.0
## Paper
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba:
https://arxiv.org/pdf/1803.02349.pdf
https://arxiv.org/abs/1803.02349
Paper link: https://arxiv.org/pdf/1803.02349.pdf
Reference code repo: (https://github.com/wangzhegeek/EGES.git)
## How to run
Create folder named `data`. Download two csv files from [here](https://github.com/Wang-Yu-Qing/dgl_data/tree/master/eges_data) into the `data` folder.
Run command: `python main.py` with default configuration, and the following message will shown up:
- Create a folder named `data`.
`mkdir data`
- Download csv data
`wget https://raw.githubusercontent.com/Wang-Yu-Qing/dgl_data/master/eges_data/action_head.csv -P data/`
`wget https://raw.githubusercontent.com/Wang-Yu-Qing/dgl_data/master/eges_data/jdata_product.csv -P data/`
- Run with the following command (with default configuration)
`python main.py`
## Result
```
Using backend: pytorch
Num skus: 33344, num brands: 3662, num shops: 4785, num cates: 79
Epoch 00000 | Step 00000 | Step Loss 0.9117 | Epoch Avg Loss: 0.9117
Epoch 00000 | Step 00100 | Step Loss 0.8736 | Epoch Avg Loss: 0.8801
Epoch 00000 | Step 00200 | Step Loss 0.8975 | Epoch Avg Loss: 0.8785
Evaluate link prediction AUC: 0.6864
Epoch 00001 | Step 00000 | Step Loss 0.8695 | Epoch Avg Loss: 0.8695
Epoch 00001 | Step 00100 | Step Loss 0.8290 | Epoch Avg Loss: 0.8643
Epoch 00001 | Step 00200 | Step Loss 0.8012 | Epoch Avg Loss: 0.8604
Evaluate link prediction AUC: 0.6875
...
Epoch 00029 | Step 00000 | Step Loss 0.7095 | Epoch Avg Loss: 0.7095
Epoch 00029 | Step 00100 | Step Loss 0.7248 | Epoch Avg Loss: 0.7139
Epoch 00029 | Step 00200 | Step Loss 0.7123 | Epoch Avg Loss: 0.7134
Evaluate link prediction AUC: 0.7084
```
The AUC of link-prediction task on test graph is computed after each epoch is done.
## Reference
https://github.com/nonva/eges
https://github.com/wangzhegeek/EGES.git
......@@ -2,54 +2,29 @@ Graph Attention Networks (GAT)
============
- Paper link: [https://arxiv.org/abs/1710.10903](https://arxiv.org/abs/1710.10903)
- Author's code repo (in Tensorflow):
- Author's code repo (tensorflow implementation):
[https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT).
- Popular pytorch implementation:
[https://github.com/Diego999/pyGAT](https://github.com/Diego999/pyGAT).
Dependencies
------------
- torch v1.0: the autograd support for sparse mm is only available in v1.0.
- requests
- sklearn
```bash
pip install torch==1.0.0 requests
```
How to run
----------
Run with following:
```bash
python3 train.py --dataset=cora --gpu=0
```
-------
Run with the following for multiclass node classification (available datasets: "cora", "citeseer", "pubmed")
```bash
python3 train.py --dataset=citeseer --gpu=0 --early-stop
python3 train.py --dataset cora
```
Run with the following for multilabel classification with PPI dataset
```bash
python3 train.py --dataset=pubmed --gpu=0 --num-out-heads=8 --weight-decay=0.001 --early-stop
python3 train_ppi.py
```
```bash
python3 train_ppi.py --gpu=0
```
> **_NOTE:_** Users may occasionally run into low accuracy issue (e.g., test accuracy < 0.8) due to overfitting. This can be resolved by adding Early Stopping or reducing maximum number of training epochs.
Results
Summary
-------
| Dataset | Test Accuracy | Time(s) | Baseline#1 times(s) | Baseline#2 times(s) |
| -------- | ------------- | ------- | ------------------- | ------------------- |
| Cora | 84.02(0.40) | 0.0113 | 0.0982 (**8.7x**) | 0.0424 (**3.8x**) |
| Citeseer | 70.91(0.79) | 0.0111 | n/a | n/a |
| Pubmed | 78.57(0.75) | 0.0115 | n/a | n/a |
| PPI | 0.9836 | n/a | n/a | n/a |
* All the accuracy numbers are obtained after 300 epochs.
* The time measures how long it takes to train one epoch.
* All time is measured on EC2 p3.2xlarge instance w/ V100 GPU.
* Baseline#1: [https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT).
* Baseline#2: [https://github.com/Diego999/pyGAT](https://github.com/Diego999/pyGAT).
* cora: ~0.821
* citeseer: ~0.710
* pubmed: ~0.780
* ppi: ~0.9744
"""
Graph Attention Networks in DGL using SPMV optimization.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import torch
import torch.nn as nn
import dgl.function as fn
from dgl.nn import GATConv
class GAT(nn.Module):
def __init__(self,
g,
num_layers,
in_dim,
num_hidden,
num_classes,
heads,
activation,
feat_drop,
attn_drop,
negative_slope,
residual):
super(GAT, self).__init__()
self.g = g
self.num_layers = num_layers
self.gat_layers = nn.ModuleList()
self.activation = activation
if num_layers > 1:
# input projection (no residual)
self.gat_layers.append(GATConv(
in_dim, num_hidden, heads[0],
feat_drop, attn_drop, negative_slope, False, self.activation))
# hidden layers
for l in range(1, num_layers-1):
# due to multi-head, the in_dim = num_hidden * num_heads
self.gat_layers.append(GATConv(
num_hidden * heads[l-1], num_hidden, heads[l],
feat_drop, attn_drop, negative_slope, residual, self.activation))
# output projection
self.gat_layers.append(GATConv(
num_hidden * heads[-2], num_classes, heads[-1],
feat_drop, attn_drop, negative_slope, residual, None))
else:
self.gat_layers.append(GATConv(
in_dim, num_classes, heads[0],
feat_drop, attn_drop, negative_slope, residual, None))
def forward(self, inputs):
h = inputs
for l in range(self.num_layers):
h = self.gat_layers[l](self.g, h)
h = h.flatten(1) if l != self.num_layers - 1 else h.mean(1)
return h
"""
Graph Attention Networks in DGL using SPMV optimization.
Multiple heads are also batched together for faster training.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import argparse
import numpy as np
import networkx as nx
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl
from dgl.data import register_data_args
import dgl.nn as dglnn
from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
from dgl import AddSelfLoop
import argparse
from gat import GAT
from utils import EarlyStopping
def accuracy(logits, labels):
_, indices = torch.max(logits, dim=1)
correct = torch.sum(indices == labels)
return correct.item() * 1.0 / len(labels)
def evaluate(model, features, labels, mask):
class GAT(nn.Module):
def __init__(self,in_size, hid_size, out_size, heads):
super().__init__()
self.gat_layers = nn.ModuleList()
# two-layer GAT
self.gat_layers.append(dglnn.GATConv(in_size, hid_size, heads[0], feat_drop=0.6, attn_drop=0.6, activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[0], out_size, heads[1], feat_drop=0.6, attn_drop=0.6, activation=None))
def forward(self, g, inputs):
h = inputs
for i, layer in enumerate(self.gat_layers):
h = layer(g, h)
if i == 1: # last layer
h = h.mean(1)
else: # other layer(s)
h = h.flatten(1)
return h
def evaluate(g, features, labels, mask, model):
model.eval()
with torch.no_grad():
logits = model(features)
logits = model(g, features)
logits = logits[mask]
labels = labels[mask]
return accuracy(logits, labels)
def main(args):
_, indices = torch.max(logits, dim=1)
correct = torch.sum(indices == labels)
return correct.item() * 1.0 / len(labels)
def train(g, features, labels, masks, model):
# define train/val samples, loss function and optimizer
train_mask = masks[0]
val_mask = masks[1]
loss_fcn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=5e-3, weight_decay=5e-4)
#training loop
for epoch in range(200):
model.train()
logits = model(g, features)
loss = loss_fcn(logits[train_mask], labels[train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
acc = evaluate(g, features, labels, val_mask, model)
print("Epoch {:05d} | Loss {:.4f} | Accuracy {:.4f} "
. format(epoch, loss.item(), acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--dataset", type=str, default="cora",
help="Dataset name ('cora', 'citeseer', 'pubmed').")
args = parser.parse_args()
print(f'Training with DGL built-in GATConv module.')
# load and preprocess dataset
transform = AddSelfLoop() # by default, it will first remove self-loops to prevent duplication
if args.dataset == 'cora':
data = CoraGraphDataset()
data = CoraGraphDataset(transform=transform)
elif args.dataset == 'citeseer':
data = CiteseerGraphDataset()
data = CiteseerGraphDataset(transform=transform)
elif args.dataset == 'pubmed':
data = PubmedGraphDataset()
data = PubmedGraphDataset(transform=transform)
else:
raise ValueError('Unknown dataset: {}'.format(args.dataset))
g = data[0]
if args.gpu < 0:
cuda = False
else:
cuda = True
g = g.int().to(args.gpu)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
g = g.int().to(device)
features = g.ndata['feat']
labels = g.ndata['label']
train_mask = g.ndata['train_mask']
val_mask = g.ndata['val_mask']
test_mask = g.ndata['test_mask']
num_feats = features.shape[1]
n_classes = data.num_labels
n_edges = g.number_of_edges()
print("""----Data statistics------'
#Edges %d
#Classes %d
#Train samples %d
#Val samples %d
#Test samples %d""" %
(n_edges, n_classes,
train_mask.int().sum().item(),
val_mask.int().sum().item(),
test_mask.int().sum().item()))
# add self loop
g = dgl.remove_self_loop(g)
g = dgl.add_self_loop(g)
n_edges = g.number_of_edges()
# create model
heads = ([args.num_heads] * (args.num_layers-1)) + [args.num_out_heads]
model = GAT(g,
args.num_layers,
num_feats,
args.num_hidden,
n_classes,
heads,
F.elu,
args.in_drop,
args.attn_drop,
args.negative_slope,
args.residual)
print(model)
if args.early_stop:
stopper = EarlyStopping(patience=100)
if cuda:
model.cuda()
loss_fcn = torch.nn.CrossEntropyLoss()
# use optimizer
optimizer = torch.optim.Adam(
model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
# initialize graph
dur = []
for epoch in range(args.epochs):
model.train()
if epoch >= 3:
if cuda:
torch.cuda.synchronize()
t0 = time.time()
# forward
logits = model(features)
loss = loss_fcn(logits[train_mask], labels[train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch >= 3:
if cuda:
torch.cuda.synchronize()
dur.append(time.time() - t0)
train_acc = accuracy(logits[train_mask], labels[train_mask])
if args.fastmode:
val_acc = accuracy(logits[val_mask], labels[val_mask])
else:
val_acc = evaluate(model, features, labels, val_mask)
if args.early_stop:
if stopper.step(val_acc, model):
break
print("Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | TrainAcc {:.4f} |"
" ValAcc {:.4f} | ETputs(KTEPS) {:.2f}".
format(epoch, np.mean(dur), loss.item(), train_acc,
val_acc, n_edges / np.mean(dur) / 1000))
print()
if args.early_stop:
model.load_state_dict(torch.load('es_checkpoint.pt'))
acc = evaluate(model, features, labels, test_mask)
print("Test Accuracy {:.4f}".format(acc))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GAT')
register_data_args(parser)
parser.add_argument("--gpu", type=int, default=-1,
help="which GPU to use. Set -1 to use CPU.")
parser.add_argument("--epochs", type=int, default=200,
help="number of training epochs")
parser.add_argument("--num-heads", type=int, default=8,
help="number of hidden attention heads")
parser.add_argument("--num-out-heads", type=int, default=1,
help="number of output attention heads")
parser.add_argument("--num-layers", type=int, default=2,
help="number of hidden layers")
parser.add_argument("--num-hidden", type=int, default=8,
help="number of hidden units")
parser.add_argument("--residual", action="store_true", default=False,
help="use residual connection")
parser.add_argument("--in-drop", type=float, default=.6,
help="input feature dropout")
parser.add_argument("--attn-drop", type=float, default=.6,
help="attention dropout")
parser.add_argument("--lr", type=float, default=0.005,
help="learning rate")
parser.add_argument('--weight-decay', type=float, default=5e-4,
help="weight decay")
parser.add_argument('--negative-slope', type=float, default=0.2,
help="the negative slope of leaky relu")
parser.add_argument('--early-stop', action='store_true', default=False,
help="indicates whether to use early stop or not")
parser.add_argument('--fastmode', action="store_true", default=False,
help="skip re-evaluate the validation set")
args = parser.parse_args()
print(args)
main(args)
masks = g.ndata['train_mask'], g.ndata['val_mask'], g.ndata['test_mask']
# create GAT model
in_size = features.shape[1]
out_size = data.num_classes
model = GAT(in_size, 8, out_size, heads=[8,1]).to(device)
# model training
print('Training...')
train(g, features, labels, masks, model)
# test the model
print('Testing...')
acc = evaluate(g, features, labels, masks[2], model)
print("Test accuracy {:.4f}".format(acc))
"""
Graph Attention Networks (PPI Dataset) in DGL using SPMV optimization.
Multiple heads are also batched together for faster training.
Compared with the original paper, this code implements
early stopping.
References
----------
Paper: https://arxiv.org/abs/1710.10903
Author's code: https://github.com/PetarV-/GAT
Pytorch implementation: https://github.com/Diego999/pyGAT
"""
import numpy as np
import torch
import dgl
import torch.nn as nn
import torch.nn.functional as F
import argparse
from sklearn.metrics import f1_score
from gat import GAT
import dgl.nn as dglnn
from dgl.data.ppi import PPIDataset
from dgl.dataloading import GraphDataLoader
from sklearn.metrics import f1_score
def evaluate(feats, model, subgraph, labels, loss_fcn):
with torch.no_grad():
model.eval()
model.g = subgraph
for layer in model.gat_layers:
layer.g = subgraph
output = model(feats.float())
loss_data = loss_fcn(output, labels.float())
predict = np.where(output.data.cpu().numpy() >= 0., 1, 0)
score = f1_score(labels.data.cpu().numpy(),
predict, average='micro')
return score, loss_data.item()
class GAT(nn.Module):
def __init__(self, in_size, hid_size, out_size, heads):
super().__init__()
self.gat_layers = nn.ModuleList()
# three-layer GAT
self.gat_layers.append(dglnn.GATConv(in_size, hid_size, heads[0], activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[0], hid_size, heads[1], residual=True, activation=F.elu))
self.gat_layers.append(dglnn.GATConv(hid_size*heads[1], out_size, heads[2], residual=True, activation=None))
def main(args):
if args.gpu<0:
device = torch.device("cpu")
else:
device = torch.device("cuda:" + str(args.gpu))
def forward(self, g, inputs):
h = inputs
for i, layer in enumerate(self.gat_layers):
h = layer(g, h)
if i == 2: # last layer
h = h.mean(1)
else: # other layer(s)
h = h.flatten(1)
return h
batch_size = args.batch_size
cur_step = 0
patience = args.patience
best_score = -1
best_loss = 10000
# define loss function
loss_fcn = torch.nn.BCEWithLogitsLoss()
# create the dataset
train_dataset = PPIDataset(mode='train')
valid_dataset = PPIDataset(mode='valid')
test_dataset = PPIDataset(mode='test')
train_dataloader = GraphDataLoader(train_dataset, batch_size=batch_size)
valid_dataloader = GraphDataLoader(valid_dataset, batch_size=batch_size)
test_dataloader = GraphDataLoader(test_dataset, batch_size=batch_size)
g = train_dataset[0]
n_classes = train_dataset.num_labels
num_feats = g.ndata['feat'].shape[1]
g = g.int().to(device)
heads = ([args.num_heads] * (args.num_layers-1)) + [args.num_out_heads]
# define the model
model = GAT(g,
args.num_layers,
num_feats,
args.num_hidden,
n_classes,
heads,
F.elu,
args.in_drop,
args.attn_drop,
args.alpha,
args.residual)
# define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
model = model.to(device)
for epoch in range(args.epochs):
def evaluate(g, features, labels, model):
model.eval()
with torch.no_grad():
output = model(g, features)
pred = np.where(output.data.cpu().numpy() >= 0, 1, 0)
score = f1_score(labels.data.cpu().numpy(), pred, average='micro')
return score
def evaluate_in_batches(dataloader, device, model):
total_score = 0
for batch_id, batched_graph in enumerate(dataloader):
batched_graph = batched_graph.to(device)
features = batched_graph.ndata['feat']
labels = batched_graph.ndata['label']
score = evaluate(batched_graph, features, labels, model)
total_score += score
return total_score / (batch_id + 1) # return average score
def train(train_dataloader, val_dataloader, device, model):
# define loss function and optimizer
loss_fcn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=5e-3, weight_decay=0)
# training loop
for epoch in range(400):
model.train()
loss_list = []
for batch, subgraph in enumerate(train_dataloader):
subgraph = subgraph.to(device)
model.g = subgraph
for layer in model.gat_layers:
layer.g = subgraph
logits = model(subgraph.ndata['feat'].float())
loss = loss_fcn(logits, subgraph.ndata['label'])
logits = []
total_loss = 0
# mini-batch loop
for batch_id, batched_graph in enumerate(train_dataloader):
batched_graph = batched_graph.to(device)
features = batched_graph.ndata['feat'].float()
labels = batched_graph.ndata['label'].float()
logits = model(batched_graph, features)
loss = loss_fcn(logits, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_list.append(loss.item())
loss_data = np.array(loss_list).mean()
print("Epoch {:05d} | Loss: {:.4f}".format(epoch + 1, loss_data))
if epoch % 5 == 0:
score_list = []
val_loss_list = []
for batch, subgraph in enumerate(valid_dataloader):
subgraph = subgraph.to(device)
score, val_loss = evaluate(subgraph.ndata['feat'], model, subgraph, subgraph.ndata['label'], loss_fcn)
score_list.append(score)
val_loss_list.append(val_loss)
mean_score = np.array(score_list).mean()
mean_val_loss = np.array(val_loss_list).mean()
print("Val F1-Score: {:.4f} ".format(mean_score))
# early stop
if mean_score > best_score or best_loss > mean_val_loss:
if mean_score > best_score and best_loss > mean_val_loss:
val_early_loss = mean_val_loss
val_early_score = mean_score
best_score = np.max((mean_score, best_score))
best_loss = np.min((best_loss, mean_val_loss))
cur_step = 0
else:
cur_step += 1
if cur_step == patience:
break
test_score_list = []
for batch, subgraph in enumerate(test_dataloader):
subgraph = subgraph.to(device)
score, test_loss = evaluate(subgraph.ndata['feat'], model, subgraph, subgraph.ndata['label'], loss_fcn)
test_score_list.append(score)
print("Test F1-Score: {:.4f}".format(np.array(test_score_list).mean()))
total_loss += loss.item()
print("Epoch {:05d} | Loss {:.4f} |". format(epoch, total_loss / (batch_id + 1) ))
if (epoch + 1) % 5 == 0:
avg_score = evaluate_in_batches(val_dataloader, device, model) # evaluate F1-score instead of loss
print(" Acc. (F1-score) {:.4f} ". format(avg_score))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GAT')
parser.add_argument("--gpu", type=int, default=-1,
help="which GPU to use. Set -1 to use CPU.")
parser.add_argument("--epochs", type=int, default=400,
help="number of training epochs")
parser.add_argument("--num-heads", type=int, default=4,
help="number of hidden attention heads")
parser.add_argument("--num-out-heads", type=int, default=6,
help="number of output attention heads")
parser.add_argument("--num-layers", type=int, default=3,
help="number of hidden layers")
parser.add_argument("--num-hidden", type=int, default=256,
help="number of hidden units")
parser.add_argument("--residual", action="store_true", default=True,
help="use residual connection")
parser.add_argument("--in-drop", type=float, default=0,
help="input feature dropout")
parser.add_argument("--attn-drop", type=float, default=0,
help="attention dropout")
parser.add_argument("--lr", type=float, default=0.005,
help="learning rate")
parser.add_argument('--weight-decay', type=float, default=0,
help="weight decay")
parser.add_argument('--alpha', type=float, default=0.2,
help="the negative slop of leaky relu")
parser.add_argument('--batch-size', type=int, default=2,
help="batch size used for training, validation and test")
parser.add_argument('--patience', type=int, default=10,
help="used for early stop")
args = parser.parse_args()
print(args)
print(f'Training PPI Dataset with DGL built-in GATConv module.')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# load and preprocess datasets
train_dataset = PPIDataset(mode='train')
val_dataset = PPIDataset(mode='valid')
test_dataset = PPIDataset(mode='test')
features = train_dataset[0].ndata['feat']
# create GAT model
in_size = features.shape[1]
out_size = train_dataset.num_labels
model = GAT(in_size, 256, out_size, heads=[4,4,6]).to(device)
# model training
print('Training...')
train_dataloader = GraphDataLoader(train_dataset, batch_size=2)
val_dataloader = GraphDataLoader(val_dataset, batch_size=2)
train(train_dataloader, val_dataloader, device, model)
main(args)
# test the model
print('Testing...')
test_dataloader = GraphDataLoader(test_dataset, batch_size=2)
avg_score = evaluate_in_batches(test_dataloader, device, model)
print("Test Accuracy (F1-score) {:.4f}".format(avg_score))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment