Unverified Commit d41d07d0 authored by Quan (Andy) Gan's avatar Quan (Andy) Gan Committed by GitHub
Browse files

[Doc and bugfix] Add docs and user guide and update tutorial for sampling pipeline (#3774)



* huuuuge update

* remove

* lint

* lint

* fix

* what happened to nccl

* update multi-gpu unsupervised graphsage example

* replace most of the dgl.mp.process with torch.mp.spawn

* update if condition for use_uva case

* update user guide

* address comments

* incorporating suggestions from @jermainewang

* oops

* fix tutorial to pass CI

* oops

* fix again
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
parent 3bd5a9b6
......@@ -4,78 +4,48 @@ dgl.dataloading
=================================
.. automodule:: dgl.dataloading
.. currentmodule:: dgl.dataloading
DataLoaders
-----------
.. currentmodule:: dgl.dataloading.pytorch
DGL DataLoader for mini-batch training works similarly to PyTorch's DataLoader.
It has a generator interface that returns mini-batches sampled from some given graphs.
DGL provides two DataLoaders: a ``NodeDataLoader`` for node classification task
and an ``EdgeDataLoader`` for edge/link prediction task.
.. autoclass:: NodeDataLoader
.. autoclass:: EdgeDataLoader
.. autoclass:: GraphDataLoader
.. autoclass:: DistNodeDataLoader
.. autoclass:: DistEdgeDataLoader
.. _api-dataloading-neighbor-sampling:
Neighbor Sampler
----------------
.. currentmodule:: dgl.dataloading.neighbor
Neighbor samplers are classes that control the behavior of ``DataLoader`` s
to sample neighbors. All of them inherit the base :class:`BlockSampler` class, but implement
different neighbor sampling strategies by overriding the ``sample_frontier`` or
the ``sample_blocks`` methods.
.. autoclass:: BlockSampler
:members: sample_frontier, sample_blocks, sample
.. autoclass:: MultiLayerNeighborSampler
:members: sample_frontier
:show-inheritance:
.. autoclass:: MultiLayerFullNeighborSampler
:show-inheritance:
.. autosummary::
:toctree: ../../generated/
Subgraph Iterators
------------------
Subgraph iterators iterate over the original graph in subgraphs. One should use subgraph
iterators with ``GraphDataLoader`` like follows:
DataLoader
NodeDataLoader
EdgeDataLoader
GraphDataLoader
DistNodeDataLoader
DistEdgeDataLoader
.. code:: python
sgiter = dgl.dataloading.ClusterGCNSubgraphIterator(
g, num_partitions=100, cache_directory='.', refresh=True)
dataloader = dgl.dataloading.GraphDataLoader(sgiter, batch_size=4, num_workers=0)
for subgraph_batch in dataloader:
train_on(subgraph_batch)
.. autoclass:: dgl.dataloading.dataloader.SubgraphIterator
.. autoclass:: dgl.dataloading.cluster_gcn.ClusterGCNSubgraphIterator
.. _api-dataloading-neighbor-sampling:
ShaDow-GNN Subgraph Sampler
---------------------------
.. currentmodule:: dgl.dataloading.shadow
Samplers
--------
.. autoclass:: ShaDowKHopSampler
.. autosummary::
:toctree: ../../generated/
.. _api-dataloading-collators:
Sampler
BlockSampler
NeighborSampler
MultiLayerFullNeighborSampler
ClusterGCNSampler
ShaDowKHopSampler
Collators
---------
.. currentmodule:: dgl.dataloading
Sampler Transformations
-----------------------
Collators are platform-agnostic classes that generates the mini-batches
given the graphs and indices to sample from.
.. autosummary::
:toctree: ../../generated/
.. autoclass:: NodeCollator
.. autoclass:: EdgeCollator
.. autoclass:: GraphCollator
as_edge_prediction_sampler
.. _api-dataloading-negative-sampling:
......@@ -83,30 +53,24 @@ Negative Samplers for Link Prediction
-------------------------------------
.. currentmodule:: dgl.dataloading.negative_sampler
Negative samplers are classes that control the behavior of the ``EdgeDataLoader``
to generate negative edges.
.. autoclass:: Uniform
:members: __call__
Negative samplers are classes that control the behavior of the edge prediction samplers
.. autoclass:: GlobalUniform
:members: __call__
Async Copying to/from GPUs
--------------------------
.. currentmodule:: dgl.dataloading
.. autosummary::
:toctree: ../../generated/
Data can be copied from the CPU to the GPU
while the GPU is being used for
computation, using the :class:`AsyncTransferer`.
For the transfer to be fully asynchronous, the context the
:class:`AsyncTranserer`
is created with must be a GPU context, and the input tensor must be in
pinned memory.
Uniform
PerSourceUniform
GlobalUniform
Utility Class and Functions for Feature Prefetching
---------------------------------------------------
.. currentmodule:: dgl.dataloading.base
.. autoclass:: AsyncTransferer
:members: __init__, async_copy
.. autosummary::
:toctree: ../../generated/
.. autoclass:: async_transferer.Transfer
:members: wait
LazyFeature
set_node_lazy_features
set_edge_lazy_features
set_src_lazy_features
set_dst_lazy_features
......@@ -173,7 +173,8 @@ set at each iteration. ``prop_edges_YYY`` applies traversal algorithm ``YYY`` an
Utilities
-----------------------------------------------
Other utilities for controlling randomness, saving and loading graphs, etc.
Other utilities for controlling randomness, saving and loading graphs, functions that applies
the same function to every elements in a container, etc.
.. autosummary::
:toctree: ../../generated/
......@@ -181,3 +182,4 @@ Other utilities for controlling randomness, saving and loading graphs, etc.
seed
save_graphs
load_graphs
apply_each
......@@ -202,8 +202,8 @@ DGL provides two levels of APIs for sampling nodes and edges to generate mini-ba
(see the section of mini-batch training). The low-level APIs require users to write code
to explicitly define how a layer of nodes are sampled (e.g., using :func:`dgl.sampling.sample_neighbors` ).
The high-level sampling APIs implement a few popular sampling algorithms for node classification
and link prediction tasks (e.g., :class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ).
and link prediction tasks (e.g., :class:`~dgl.dataloading.NodeDataLoader` and
:class:`~dgl.dataloading.EdgeDataLoader` ).
The distributed sampling module follows the same design and provides two levels of sampling APIs.
For the lower-level sampling API, it provides :func:`~dgl.distributed.sample_neighbors` for
......@@ -240,10 +240,10 @@ difference is that users need to use :func:`dgl.distributed.sample_neighbors` an
for batch in dataloader:
...
The high-level sampling APIs (:class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ) has distributed counterparts
(:class:`~dgl.dataloading.pytorch.DistNodeDataLoader` and
:class:`~dgl.dataloading.pytorch.DistEdgeDataLoader`). The code is exactly the
The high-level sampling APIs (:class:`~dgl.dataloading.NodeDataLoader` and
:class:`~dgl.dataloading.EdgeDataLoader` ) has distributed counterparts
(:class:`~dgl.dataloading.DistNodeDataLoader` and
:class:`~dgl.dataloading.DistEdgeDataLoader`). The code is exactly the
same as single-process sampling otherwise.
.. code:: python
......
......@@ -20,7 +20,7 @@ You can use the
To use the neighborhood sampler provided by DGL for edge classification,
one need to instead combine it with
:class:`~dgl.dataloading.pytorch.EdgeDataLoader`, which iterates
:func:`~dgl.dataloading.as_edge_prediction_sampler`, which iterates
over a set of edges in minibatches, yielding the subgraph induced by the
edge minibatch and *message flow graphs* (MFGs) to be consumed by the module below.
......@@ -30,7 +30,8 @@ putting the list of generated MFGs onto GPU.
.. code:: python
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(sampler)
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
batch_size=1024,
shuffle=True,
......@@ -50,6 +51,8 @@ putting the list of generated MFGs onto GPU.
detailed explanation of the concept of MFGs, please refer to
:ref:`guide-minibatch-customizing-neighborhood-sampler`.
.. _guide-minibatch-edge-classification-sampler-exclude:
Removing edges in the minibatch from the original graph for neighbor sampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -62,8 +65,8 @@ advantage.
Therefore in edge classification you sometimes would like to exclude the
edges sampled in the minibatch from the original graph for neighborhood
sampling, as well as the reverse edges of the sampled edges on an
undirected graph. You can specify ``exclude='reverse_id'`` in instantiation
of :class:`~dgl.dataloading.pytorch.EdgeDataLoader`, with the mapping of the edge
undirected graph. You can specify ``exclude='reverse_id'`` in calling
:func:`~dgl.dataloading.as_edge_prediction_sampler`, with the mapping of the edge
IDs to their reverse edges IDs. Usually doing so will lead to much slower
sampling process due to locating the reverse edges involving in the minibatch
and removing them.
......@@ -71,16 +74,11 @@ and removing them.
.. code:: python
n_edges = g.number_of_edges()
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, exclude='reverse_id', reverse_eids=torch.cat([
torch.arange(n_edges // 2, n_edges), torch.arange(0, n_edges // 2)]))
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
# The following two arguments are specifically for excluding the minibatch
# edges and their reverse edges from the original graph for neighborhood
# sampling.
exclude='reverse_id',
reverse_eids=torch.cat([
torch.arange(n_edges // 2, n_edges), torch.arange(0, n_edges // 2)]),
batch_size=1024,
shuffle=True,
drop_last=False,
......@@ -248,15 +246,16 @@ over the edge types for :meth:`~dgl.DGLHeteroGraph.apply_edges`.
Data loader definition is also very similar to that of node
classification. The only difference is that you need
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` instead of
:class:`~dgl.dataloading.pytorch.NodeDataLoader`, and you will be supplying a
:func:`~dgl.dataloading.as_edge_prediction_sampler`,
and you will be supplying a
dictionary of edge types and edge ID tensors instead of a dictionary of
node types and node ID tensors.
.. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(sampler)
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
batch_size=1024,
shuffle=True,
......@@ -278,16 +277,12 @@ reverse edges then goes as follows.
.. code:: python
dataloader = dgl.dataloading.EdgeDataLoader(
g, train_eid_dict, sampler,
# The following two arguments are specifically for excluding the minibatch
# edges and their reverse edges from the original graph for neighborhood
# sampling.
exclude='reverse_types',
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, exclude='reverse_types',
reverse_etypes={'follow': 'followed by', 'followed by': 'follow',
'purchase': 'purchased by', 'purchased by': 'purchase'}
'purchase': 'purchased by', 'purchased by': 'purchase'})
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
batch_size=1024,
shuffle=True,
drop_last=False,
......
......@@ -48,13 +48,13 @@ One can use GPU-based neighborhood sampling with DGL data loaders via:
* Set ``num_workers`` argument to 0, because CUDA does not allow multiple processes
accessing the same context.
All the other arguments for the :class:`~dgl.dataloading.pytorch.NodeDataLoader` can be
All the other arguments for the :class:`~dgl.dataloading.DataLoader` can be
the same as the other user guides and tutorials.
.. code:: python
g = g.to('cuda:0')
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DataLoader(
g, # The graph must be on GPU.
train_nid,
sampler,
......@@ -64,8 +64,6 @@ the same as the other user guides and tutorials.
drop_last=False,
shuffle=True)
GPU-based neighbor sampling also works for :class:`~dgl.dataloading.pytorch.EdgeDataLoader` since DGL 0.8.
.. note::
GPU-based neighbor sampling also works for custom neighborhood samplers as long as
......@@ -91,14 +89,13 @@ You can enable UVA-based neighborhood sampling in DGL data loaders via:
* Set ``num_workers`` argument to 0, because CUDA does not allow multiple processes
accessing the same context.
All the other arguments for the :class:`~dgl.dataloading.pytorch.NodeDataLoader` can be
All the other arguments for the :class:`~dgl.dataloading.DataLoader` can be
the same as the other user guides and tutorials.
UVA-based neighbor sampling also works for :class:`~dgl.dataloading.pytorch.EdgeDataLoader`.
.. code:: python
g = g.pin_memory_()
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DataLoader(
g, # The graph must be pinned.
train_nid,
sampler,
......@@ -116,7 +113,7 @@ especially for multi-GPU training.
To use UVA-based sampling in multi-GPU training, you should first materialize all the
necessary sparse formats of the graph and copy them to the shared memory explicitly
before spawning training processes. Then you should pin the shared graph in each training
process respectively. Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_multi_gpu.py>`_ for more details.
process respectively. Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details.
Using GPU-based neighbor sampling with DGL functions
......
......@@ -15,7 +15,7 @@ classification.
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` in DGL also
:func:`~dgl.dataloading.as_edge_prediction_sampler` in DGL also
supports generating negative samples for link prediction. To do so, you
need to provide the negative sampling function.
:class:`~dgl.dataloading.negative_sampler.Uniform` is a
......@@ -27,9 +27,10 @@ uniformly for each source node of an edge.
.. code:: python
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
dataloader = dgl.dataloading.DataLoader(
g, train_seeds, sampler,
negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
......@@ -60,10 +61,10 @@ proportional to a power of degrees.
src = src.repeat_interleave(self.k)
dst = self.weights.multinomial(len(src), replacement=True)
return src, dst
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=NegativeSampler(g, 5))
dataloader = dgl.dataloading.DataLoader(
g, train_seeds, sampler,
negative_sampler=NegativeSampler(g, 5),
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
......@@ -229,9 +230,10 @@ ID tensors.
.. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
batch_size=1024,
shuffle=True,
drop_last=False,
......@@ -269,10 +271,10 @@ sampler. For instance, the following iterates over all edges of the heterogeneo
train_eid_dict = {
etype: g.edges(etype=etype, form='eid')
for etype in g.canonical_etypes}
dataloader = dgl.dataloading.EdgeDataLoader(
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=NegativeSampler(g, 5))
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler,
negative_sampler=NegativeSampler(g, 5),
batch_size=1024,
shuffle=True,
drop_last=False,
......
......@@ -26,8 +26,8 @@ The simplest neighborhood sampler is
which makes the node gather messages from all of its neighbors.
To use a sampler provided by DGL, one also need to combine it with
:class:`~dgl.dataloading.pytorch.NodeDataLoader`, which iterates
over a set of nodes in minibatches.
:class:`~dgl.dataloading.DataLoader`, which iterates
over a set of indices (nodes in this case) in minibatches.
For example, the following code creates a PyTorch DataLoader that
iterates over the training node ID array ``train_nids`` in batches,
......@@ -42,7 +42,7 @@ putting the list of generated MFGs onto GPU.
import torch.nn.functional as F
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DataLoader(
g, train_nids, sampler,
batch_size=1024,
shuffle=True,
......@@ -212,7 +212,7 @@ removed for simplicity):
Some of the samplers provided by DGL also support heterogeneous graphs.
For example, one can still use the provided
:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` class and
:class:`~dgl.dataloading.pytorch.NodeDataLoader` class for
:class:`~dgl.dataloading.DataLoader` class for
stochastic training. For full-neighbor sampling, the only difference
would be that you would specify a dictionary of node
types and node IDs for the training set.
......@@ -220,7 +220,7 @@ types and node IDs for the training set.
.. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DataLoader(
g, train_nid_dict, sampler,
batch_size=1024,
shuffle=True,
......
.. _guide-minibatch-prefetching:
6.8 Feature Prefetching
-----------------------
In minibatch training of GNNs, especially with neighbor sampling approaches, we often see
that a large amount of node features need to be copied to the device for computing GNNs.
To mitigate this bottleneck of data movement, DGL supports *feature prefetching*
so that the model computation and data movement can happen in parallel.
Enabling Prefetching with DGL's Builtin Samplers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the DGL samplers in :ref:`api-dataloading` allows users to specify which
node and edge data to prefetch via arguments like :attr:`prefetch_node_feats`.
For example, the following code asks :class:`dgl.dataloading.NeighborSampler` to prefetch
the node data named ``feat`` and save it to the ``srcdata`` of the first message flow
graph. It also asks the sampler to prefetch and save the node data named ``label``
to the ``dstdata`` of the last message flow graph:
.. code:: python
graph = ... # the graph to sample from
graph.ndata['feat'] = ... # node feature
graph.ndata['label'] = ... # node label
train_nids = ... # an 1-D integer tensor of training node IDs
# create a sample and specify what data to prefetch
sampler = dgl.dataloading.NeighborSampler(
[15, 10, 5], prefetch_node_feats=['feat'], prefetch_labels=['label'])
# create a dataloader
dataloader = dgl.dataloading.DataLoader(
graph, train_nids, sampler,
batch_size=32,
... # other arguments
)
for mini_batch in dataloader:
# unpack mini batch
input_nodes, output_nodes, subgs = mini_batch
# the following data has been pre-fetched
feat = subgs[0].srcdata['feat']
label = subgs[-1].dstdata['label']
train(subgs, feat, label)
.. note::
Even without specifying the the prefetch arguments, users can still access
``subgs[0].srcdata['feat']`` and ``subgs[-1].dstdata['label']`` because DGL
internally keeps a reference to the node/edge data of the original graph when
a subgraph is created. Accessing subgraph features will incur data fetching
from the original graph immediately while prefetching ensures data
to be available before getting from data loader.
Enabling Prefetching in Custom Samplers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Users can implement their own rules of prefetching when writing custom samplers.
Here is the code of ``NeighborSampler`` with prefetching:
.. code:: python
class NeighborSampler(dgl.dataloading.Sampler):
def __init__(self,
fanouts : list[int],
prefetch_node_feats: list[str] = None,
prefetch_edge_feats: list[str] = None,
prefetch_labels: list[str] = None):
super().__init__()
self.fanouts = fanouts
self.prefetch_node_feats = prefetch_node_feats
self.prefetch_edge_feats = prefetch_edge_feats
self.prefetch_labels = prefetch_labels
def sample(self, g, seed_nodes):
output_nodes = seed_nodes
subgs = []
for fanout in reversed(self.fanouts):
# Sample a fixed number of neighbors of the current seed nodes.
sg = g.sample_neighbors(seed_nodes, fanout)
# Convert this subgraph to a message flow graph.
sg = dgl.to_block(sg, seed_nodes)
seed_nodes = sg.srcdata[NID]
subgs.insert(0, sg)
input_nodes = seed_nodes
# handle prefetching
dgl.set_src_lazy_features(subgs[0], self.prefetch_node_feats)
dgl.set_dst_lazy_features(subgs[-1], self.prefetch_labels)
for subg in subgs:
dgl.set_edge_lazy_features(subg, self.prefetch_edge_feats)
return input_nodes, output_nodes, subgs
Using the :func:`~dgl.set_src_lazy_features`, :func:`~dgl.set_dst_lazy_features`
and :func:`~dgl.set_edge_lazy_features`, users can tell ``DataLoader`` which
features to prefetch and where to save them (``srcdata``, ``dstdata`` or ``edata``).
See :ref:`guide-minibatch-customizing-neighborhood-sampler` for more explanations
on how to write a custom graph sampler.
\ No newline at end of file
......@@ -75,3 +75,4 @@ sampling:
minibatch-nn
minibatch-inference
minibatch-gpu-sampling
minibatch-prefetching
......@@ -15,7 +15,7 @@ from numpy import random
from torch.nn.parameter import Parameter
import dgl
import dgl.function as fn
import dgl.multiprocessing as mp
import torch.multiprocessing as mp
from utils import *
......@@ -491,13 +491,7 @@ def train_model(network_data):
if n_gpus == 1:
run(0, n_gpus, args, devices, data)
else:
procs = []
for proc_id in range(n_gpus):
p = mp.Process(target=run, args=(proc_id, n_gpus, args, devices, data))
p.start()
procs.append(p)
for p in procs:
p.join()
mp.spawn(run, args=(n_gpus, args, devices, data), nprocs=n_gpus)
if __name__ == "__main__":
......
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, sg, x):
h = x
for l, layer in enumerate(self.layers):
h = layer(sg, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph.ndata['train_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, train_idx, True)
graph.ndata['valid_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, valid_idx, True)
graph.ndata['test_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, test_idx, True)
model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
num_partitions = 1000
sampler = dgl.dataloading.ClusterGCNSampler(
graph, num_partitions,
prefetch_node_feats=['feat', 'label', 'train_mask', 'valid_mask', 'test_mask'])
# DataLoader for generic dataloading with a graph, a set of indices (any indices, like
# partition IDs here), and a graph sampler.
# NodeDataLoader and EdgeDataLoader are simply special cases of DataLoader where the
# indices are guaranteed to be node and edge IDs.
dataloader = dgl.dataloading.DataLoader(
graph,
torch.arange(num_partitions),
sampler,
device='cuda',
batch_size=100,
shuffle=True,
drop_last=False,
num_workers=0,
use_uva=True)
durations = []
for _ in range(10):
t0 = time.time()
model.train()
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label'][:, 0]
m = sg.ndata['train_mask']
y_hat = model(sg, x)
loss = F.cross_entropy(y_hat[m], y[m])
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat[m], y[m])
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
model.eval()
with torch.no_grad():
val_preds, test_preds = [], []
val_labels, test_labels = [], []
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label'][:, 0]
m_val = sg.ndata['valid_mask']
m_test = sg.ndata['test_mask']
y_hat = model(sg, x)
val_preds.append(y_hat[m_val])
val_labels.append(y[m_val])
test_preds.append(y_hat[m_test])
test_labels.append(y[m_test])
val_preds = torch.cat(val_preds, 0)
val_labels = torch.cat(val_labels, 0)
test_preds = torch.cat(test_preds, 0)
test_labels = torch.cat(test_labels, 0)
val_acc = MF.accuracy(val_preds, val_labels)
test_acc = MF.accuracy(test_preds, test_labels)
print('Validation acc:', val_acc.item(), 'Test acc:', test_acc.item())
print(np.mean(durations[4:]), np.std(durations[4:]))
from . import graph
from . import storages
from .graph import *
from .other_feature import *
from .wrapper import *
class GraphStorage(object):
def get_node_storage(self, key, ntype=None):
pass
def get_edge_storage(self, key, etype=None):
pass
# Required for checking whether a single dict is allowed for ndata and edata.
@property
def ntypes(self):
pass
@property
def canonical_etypes(self):
pass
def etypes(self):
return [etype[1] for etype in self.canonical_etypes]
def sample_neighbors(self, seed_nodes, fanout, edge_dir='in', prob=None,
exclude_edges=None, replace=False, output_device=None):
"""Return a DGLGraph which is a subgraph induced by sampling neighboring edges of
the given nodes.
See ``dgl.sampling.sample_neighbors`` for detailed semantics.
Parameters
----------
seed_nodes : Tensor or dict[str, Tensor]
Node IDs to sample neighbors from.
This argument can take a single ID tensor or a dictionary of node types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
fanout : int or dict[etype, int]
The number of edges to be sampled for each node on each edge type.
This argument can take a single int or a dictionary of edge types and ints.
If a single int is given, DGL will sample this number of edges for each node for
every edge type.
If -1 is given for a single edge type, all the neighboring edges with that edge
type will be selected.
prob : str, optional
Feature name used as the (unnormalized) probabilities associated with each
neighboring edge of a node. The feature must have only one element for each
edge.
The features must be non-negative floats, and the sum of the features of
inbound/outbound edges for every node must be positive (though they don't have
to sum up to one). Otherwise, the result will be undefined.
If :attr:`prob` is not None, GPU sampling is not supported.
exclude_edges: tensor or dict
Edge IDs to exclude during sampling neighbors for the seed nodes.
This argument can take a single ID tensor or a dictionary of edge types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
replace : bool, optional
If True, sample with replacement.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring
edges. The induced edge IDs will be in ``edata[dgl.EID]``.
"""
pass
# Required in Cluster-GCN
def subgraph(self, nodes, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given nodes.
This has the same semantics as ``dgl.node_subgraph``.
Parameters
----------
nodes : nodes or dict[str, nodes]
The nodes to form the subgraph. The allowed nodes formats are:
* Int Tensor: Each element is a node ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is a node ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
node :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being node types
and values being the node IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass
# Required in Link Prediction
def edge_subgraph(self, edges, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given edges.
This has the same semantics as ``dgl.edge_subgraph``.
Parameters
----------
edges : edges or dict[(str, str, str), edges]
The edges to form the subgraph. The allowed edges formats are:
* Int Tensor: Each element is an edge ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is an edge ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
edge :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being edge types
and values being the edge IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass
# Required in Link Prediction negative sampler
def find_edges(self, edges, etype=None, output_device=None):
"""Return the source and destination node IDs given the edge IDs within the given edge type.
"""
pass
# Required in Link Prediction negative sampler
def num_nodes(self, ntype):
"""Return the number of nodes for the given node type."""
pass
def global_uniform_negative_sampling(self, num_samples, exclude_self_loops=True,
replace=False, etype=None):
"""Per source negative sampling as in ``dgl.dataloading.GlobalUniform``"""
from collections import Mapping
from dgl.storages import wrap_storage
from dgl.utils import recursive_apply
# A GraphStorage class where ndata and edata can be any FeatureStorage but
# otherwise the same as the wrapped DGLGraph.
class OtherFeatureGraphStorage(object):
def __init__(self, g, ndata=None, edata=None):
self.g = g
self._ndata = recursive_apply(ndata, wrap_storage) if ndata is not None else {}
self._edata = recursive_apply(edata, wrap_storage) if edata is not None else {}
for k, v in self._ndata.items():
if not isinstance(v, Mapping):
assert len(self.g.ntypes) == 1
self._ndata[k] = {self.g.ntypes[0]: v}
for k, v in self._edata.items():
if not isinstance(v, Mapping):
assert len(self.g.canonical_etypes) == 1
self._edata[k] = {self.g.canonical_etypes[0]: v}
def get_node_storage(self, key, ntype=None):
if ntype is None:
ntype = self.g.ntypes[0]
return self._ndata[key][ntype]
def get_edge_storage(self, key, etype=None):
if etype is None:
etype = self.g.canonical_etypes[0]
return self._edata[key][etype]
def __getattr__(self, key):
# I wrote it in this way because I'm too lazy to write "def sample_neighbors"
# or stuff like that.
if key in ['ntypes', 'etypes', 'canonical_etypes', 'sample_neighbors',
'subgraph', 'edge_subgraph', 'find_edges', 'num_nodes']:
# Delegate to the wrapped DGLGraph instance.
return getattr(self.g, key)
else:
return super().__getattr__(key)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
# OGB must follow DGL if both DGL and PyG are installed. Otherwise DataLoader will hang.
# (This is a long-standing issue)
from ogb.nodeproppred import DglNodePropPredDataset
import dglnew
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
# This is an example of using feature storage other than tensors
feat_np = graph.ndata['feat'].numpy()
feat = np.memmap('feat.npy', mode='w+', shape=feat_np.shape, dtype='float32')
print(feat.shape)
feat[:] = feat_np
model = SAGE(feat.shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
graph.create_formats_()
# Because NumpyStorage is registered with memmap, one can directly add numpy memmaps
graph = dglnew.graph.OtherFeatureGraphStorage(graph, ndata={'feat': feat, 'label': labels})
#graph = dglnew.graph.OtherFeatureGraphStorage(graph,
# ndata={'feat': dgl.storages.NumpyStorage(feat), 'label': labels})
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu', prefetch_node_feats=['feat'],
prefetch_labels=['label'])
dataloader = dgl.dataloading.NodeDataLoader(
graph,
train_idx,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=4,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
y = blocks[-1].dstdata['label'][:, 0]
y_hat = model(blocks, x)
loss = F.cross_entropy(y_hat, y)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat, y)
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
......@@ -8,22 +8,12 @@ This repo reproduce the reported speed and performance maximally on Reddit and P
Dependencies
------------
- Python 3.7+(for string formatting features)
- PyTorch 1.5.0+
- PyTorch 1.9.0+
- sklearn
## Run Experiments.
* For reddit data, you may run the following scripts
## Run Experiments
```bash
python cluster_gcn.py
```
./run_reddit.sh
```
You should be able to see the final test F1 is around `Test F1-mic0.9612, Test F1-mac0.9399`.
Note that the first run of provided script is considerably slow than reported in the paper, which is presumably due to dataloader used. After caching the partition allocation, the overall speed would be in a normal scale. On a 1080Ti and Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz machine I am able to train it within 45s. After the first epoch the F1-mic on Validation dataset should be around `0.93`.
* For PPI data, you may run the following scripts
```
./run_ppi.sh
```
You should be able to see the final test F1 is around `Test F1-mic0.9924, Test F1-mac0.9917`. The training finished in 10 mins.
import argparse
import os
import time
import random
import numpy as np
import networkx as nx
import sklearn.preprocessing
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.function as fn
from dgl.data import register_data_args
from modules import GraphSAGE
from sampler import ClusterIter
from utils import Logger, evaluate, save_log_dir, load_data
def main(args):
torch.manual_seed(args.rnd_seed)
np.random.seed(args.rnd_seed)
random.seed(args.rnd_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
multitask_data = set(['ppi'])
multitask = args.dataset in multitask_data
# load and preprocess dataset
data = load_data(args)
g = data.g
train_mask = g.ndata['train_mask']
val_mask = g.ndata['val_mask']
test_mask = g.ndata['test_mask']
labels = g.ndata['label']
train_nid = np.nonzero(train_mask.data.numpy())[0].astype(np.int64)
# Normalize features
if args.normalize:
feats = g.ndata['feat']
train_feats = feats[train_mask]
scaler = sklearn.preprocessing.StandardScaler()
scaler.fit(train_feats.data.numpy())
features = scaler.transform(feats.data.numpy())
g.ndata['feat'] = torch.FloatTensor(features)
in_feats = g.ndata['feat'].shape[1]
n_classes = data.num_classes
n_edges = g.number_of_edges()
n_train_samples = train_mask.int().sum().item()
n_val_samples = val_mask.int().sum().item()
n_test_samples = test_mask.int().sum().item()
print("""----Data statistics------'
#Edges %d
#Classes %d
#Train samples %d
#Val samples %d
#Test samples %d""" %
(n_edges, n_classes,
n_train_samples,
n_val_samples,
n_test_samples))
# create GCN model
if args.self_loop and not args.dataset.startswith('reddit'):
g = dgl.remove_self_loop(g)
g = dgl.add_self_loop(g)
print("adding self-loop edges")
# metis only support int64 graph
g = g.long()
if args.use_pp:
g.update_all(fn.copy_u('feat', 'm'), fn.sum('m', 'feat_agg'))
g.ndata['feat'] = torch.cat([g.ndata['feat'], g.ndata['feat_agg']], 1)
del g.ndata['feat_agg']
cluster_iterator = dgl.dataloading.GraphDataLoader(
dgl.dataloading.ClusterGCNSubgraphIterator(
dgl.node_subgraph(g, train_nid), args.psize, './cache'),
batch_size=args.batch_size, num_workers=4)
#cluster_iterator = ClusterIter(
# args.dataset, g, args.psize, args.batch_size, train_nid, use_pp=args.use_pp)
# set device for dataset tensors
if args.gpu < 0:
cuda = False
else:
cuda = True
torch.cuda.set_device(args.gpu)
val_mask = val_mask.cuda()
test_mask = test_mask.cuda()
g = g.int().to(args.gpu)
print('labels shape:', g.ndata['label'].shape)
print("features shape, ", g.ndata['feat'].shape)
model = GraphSAGE(in_feats,
args.n_hidden,
n_classes,
args.n_layers,
F.relu,
args.dropout,
args.use_pp)
if cuda:
model.cuda()
# logger and so on
log_dir = save_log_dir(args)
logger = Logger(os.path.join(log_dir, 'loggings'))
logger.write(args)
# Loss function
if multitask:
print('Using multi-label loss')
loss_f = nn.BCEWithLogitsLoss()
else:
print('Using multi-class loss')
loss_f = nn.CrossEntropyLoss()
# use optimizer
optimizer = torch.optim.Adam(model.parameters(),
lr=args.lr,
weight_decay=args.weight_decay)
# set train_nids to cuda tensor
if cuda:
train_nid = torch.from_numpy(train_nid).cuda()
print("current memory after model before training",
torch.cuda.memory_allocated(device=train_nid.device) / 1024 / 1024)
start_time = time.time()
best_f1 = -1
for epoch in range(args.n_epochs):
for j, cluster in enumerate(cluster_iterator):
# sync with upper level training graph
if cuda:
cluster = cluster.to(torch.cuda.current_device())
model.train()
# forward
batch_labels = cluster.ndata['label']
batch_train_mask = cluster.ndata['train_mask']
if batch_train_mask.sum().item() == 0:
continue
pred = model(cluster)
loss = loss_f(pred[batch_train_mask],
batch_labels[batch_train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
# in PPI case, `log_every` is chosen to log one time per epoch.
# Choose your log freq dynamically when you want more info within one epoch
if j % args.log_every == 0:
print(f"epoch:{epoch}/{args.n_epochs}, Iteration {j}/"
f"{len(cluster_iterator)}:training loss", loss.item())
print("current memory:",
torch.cuda.memory_allocated(device=pred.device) / 1024 / 1024)
# evaluate
if epoch % args.val_every == 0:
val_f1_mic, val_f1_mac = evaluate(
model, g, labels, val_mask, multitask)
print(
"Val F1-mic{:.4f}, Val F1-mac{:.4f}". format(val_f1_mic, val_f1_mac))
if val_f1_mic > best_f1:
best_f1 = val_f1_mic
print('new best val f1:', best_f1)
torch.save(model.state_dict(), os.path.join(
log_dir, 'best_model.pkl'))
end_time = time.time()
print(f'training using time {start_time-end_time}')
# test
if args.use_val:
model.load_state_dict(torch.load(os.path.join(
log_dir, 'best_model.pkl')))
test_f1_mic, test_f1_mac = evaluate(
model, g, labels, test_mask, multitask)
print("Test F1-mic{:.4f}, Test F1-mac{:.4f}". format(test_f1_mic, test_f1_mac))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GCN')
register_data_args(parser)
parser.add_argument("--dropout", type=float, default=0.5,
help="dropout probability")
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--lr", type=float, default=3e-2,
help="learning rate")
parser.add_argument("--n-epochs", type=int, default=200,
help="number of training epochs")
parser.add_argument("--log-every", type=int, default=100,
help="the frequency to save model")
parser.add_argument("--batch-size", type=int, default=20,
help="batch size")
parser.add_argument("--psize", type=int, default=1500,
help="partition number")
parser.add_argument("--test-batch-size", type=int, default=1000,
help="test batch size")
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden gcn units")
parser.add_argument("--n-layers", type=int, default=1,
help="number of hidden gcn layers")
parser.add_argument("--val-every", type=int, default=1,
help="number of epoch of doing inference on validation")
parser.add_argument("--rnd-seed", type=int, default=3,
help="number of epoch of doing inference on validation")
parser.add_argument("--self-loop", action='store_true',
help="graph self-loop (default=False)")
parser.add_argument("--use-pp", action='store_true',
help="whether to use precomputation")
parser.add_argument("--normalize", action='store_true',
help="whether to use normalized feature")
parser.add_argument("--use-val", action='store_true',
help="whether to use validated best model to test")
parser.add_argument("--weight-decay", type=float, default=5e-4,
help="Weight for L2 loss")
parser.add_argument("--note", type=str, default='none',
help="note for log dir")
args = parser.parse_args()
print(args)
main(args)
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, sg, x):
h = x
for l, layer in enumerate(self.layers):
h = layer(sg, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = dgl.data.AsNodePredDataset(DglNodePropPredDataset('ogbn-products'))
graph = dataset[0] # already prepares ndata['label'/'train_mask'/'val_mask'/'test_mask']
model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
num_partitions = 1000
sampler = dgl.dataloading.ClusterGCNSampler(
graph, num_partitions,
prefetch_ndata=['feat', 'label', 'train_mask', 'val_mask', 'test_mask'])
# DataLoader for generic dataloading with a graph, a set of indices (any indices, like
# partition IDs here), and a graph sampler.
# NodeDataLoader and EdgeDataLoader are simply special cases of DataLoader where the
# indices are guaranteed to be node and edge IDs.
dataloader = dgl.dataloading.DataLoader(
graph,
torch.arange(num_partitions).to('cuda'),
sampler,
device='cuda',
batch_size=100,
shuffle=True,
drop_last=False,
num_workers=0,
use_uva=True)
durations = []
for _ in range(10):
t0 = time.time()
model.train()
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label']
m = sg.ndata['train_mask'].bool()
y_hat = model(sg, x)
loss = F.cross_entropy(y_hat[m], y[m])
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat[m], y[m])
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
model.eval()
with torch.no_grad():
val_preds, test_preds = [], []
val_labels, test_labels = [], []
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label']
m_val = sg.ndata['val_mask'].bool()
m_test = sg.ndata['test_mask'].bool()
y_hat = model(sg, x)
val_preds.append(y_hat[m_val])
val_labels.append(y[m_val])
test_preds.append(y_hat[m_test])
test_labels.append(y[m_test])
val_preds = torch.cat(val_preds, 0)
val_labels = torch.cat(val_labels, 0)
test_preds = torch.cat(test_preds, 0)
test_labels = torch.cat(test_labels, 0)
val_acc = MF.accuracy(val_preds, val_labels)
test_acc = MF.accuracy(test_preds, test_labels)
print('Validation acc:', val_acc.item(), 'Test acc:', test_acc.item())
print(np.mean(durations[4:]), np.std(durations[4:]))
import math
import dgl.function as fn
import torch
import torch.nn as nn
class GraphSAGELayer(nn.Module):
def __init__(self,
in_feats,
out_feats,
activation,
dropout,
bias=True,
use_pp=False,
use_lynorm=True):
super(GraphSAGELayer, self).__init__()
# The input feature size gets doubled as we concatenated the original
# features with the new features.
self.linear = nn.Linear(2 * in_feats, out_feats, bias=bias)
self.activation = activation
self.use_pp = use_pp
if dropout:
self.dropout = nn.Dropout(p=dropout)
else:
self.dropout = 0.
if use_lynorm:
self.lynorm = nn.LayerNorm(out_feats, elementwise_affine=True)
else:
self.lynorm = lambda x: x
self.reset_parameters()
def reset_parameters(self):
stdv = 1. / math.sqrt(self.linear.weight.size(1))
self.linear.weight.data.uniform_(-stdv, stdv)
if self.linear.bias is not None:
self.linear.bias.data.uniform_(-stdv, stdv)
def forward(self, g, h):
g = g.local_var()
if not self.use_pp:
norm = self.get_norm(g)
g.ndata['h'] = h
g.update_all(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'))
ah = g.ndata.pop('h')
h = self.concat(h, ah, norm)
if self.dropout:
h = self.dropout(h)
h = self.linear(h)
h = self.lynorm(h)
if self.activation:
h = self.activation(h)
return h
def concat(self, h, ah, norm):
ah = ah * norm
h = torch.cat((h, ah), dim=1)
return h
def get_norm(self, g):
norm = 1. / g.in_degrees().float().unsqueeze(1)
norm[torch.isinf(norm)] = 0
norm = norm.to(self.linear.weight.device)
return norm
class GraphSAGE(nn.Module):
def __init__(self,
in_feats,
n_hidden,
n_classes,
n_layers,
activation,
dropout,
use_pp):
super(GraphSAGE, self).__init__()
self.layers = nn.ModuleList()
# input layer
self.layers.append(GraphSAGELayer(in_feats, n_hidden, activation=activation,
dropout=dropout, use_pp=use_pp, use_lynorm=True))
# hidden layers
for i in range(n_layers - 1):
self.layers.append(
GraphSAGELayer(n_hidden, n_hidden, activation=activation, dropout=dropout,
use_pp=False, use_lynorm=True))
# output layer
self.layers.append(GraphSAGELayer(n_hidden, n_classes, activation=None,
dropout=dropout, use_pp=False, use_lynorm=False))
def forward(self, g):
h = g.ndata['feat']
for layer in self.layers:
h = layer(g, h)
return h
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment