Unverified Commit 701b4fcc authored by Quan (Andy) Gan's avatar Quan (Andy) Gan Committed by GitHub
Browse files

[Sampling] New sampling pipeline plus asynchronous prefetching (#3665)

* initial update

* more

* more

* multi-gpu example

* cluster gcn, finalize homogeneous

* more explanation

* fix

* bunch of fixes

* fix

* RGAT example and more fixes

* shadow-gnn sampler and some changes in unit test

* fix

* wth

* more fixes

* remove shadow+node/edge dataloader tests for possible ux changes

* lints

* add legacy dataloading import just in case

* fix

* update pylint for f-strings

* fix

* lint

* lint

* lint again

* cherry-picking commit fa9f494

* oops

* fix

* add sample_neighbors in dist_graph

* fix

* lint

* fix

* fix

* fix

* fix tutorial

* fix

* fix

* fix

* fix warning

* remove debug

* add get_foo_storage apis

* lint
parent 5152a879
......@@ -17,6 +17,8 @@ and an ``EdgeDataLoader`` for edge/link prediction task.
.. autoclass:: NodeDataLoader
.. autoclass:: EdgeDataLoader
.. autoclass:: GraphDataLoader
.. autoclass:: DistNodeDataLoader
.. autoclass:: DistEdgeDataLoader
.. _api-dataloading-neighbor-sampling:
......
......@@ -202,20 +202,20 @@ DGL provides two levels of APIs for sampling nodes and edges to generate mini-ba
(see the section of mini-batch training). The low-level APIs require users to write code
to explicitly define how a layer of nodes are sampled (e.g., using :func:`dgl.sampling.sample_neighbors` ).
The high-level sampling APIs implement a few popular sampling algorithms for node classification
and link prediction tasks (e.g., :class:`~dgl.dataloading.pytorch.NodeDataloader` and
:class:`~dgl.dataloading.pytorch.EdgeDataloader` ).
and link prediction tasks (e.g., :class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ).
The distributed sampling module follows the same design and provides two levels of sampling APIs.
For the lower-level sampling API, it provides :func:`~dgl.distributed.sample_neighbors` for
distributed neighborhood sampling on :class:`~dgl.distributed.DistGraph`. In addition, DGL provides
a distributed Dataloader (:class:`~dgl.distributed.DistDataLoader` ) for distributed sampling.
The distributed Dataloader has the same interface as Pytorch DataLoader except that users cannot
a distributed DataLoader (:class:`~dgl.distributed.DistDataLoader` ) for distributed sampling.
The distributed DataLoader has the same interface as Pytorch DataLoader except that users cannot
specify the number of worker processes when creating a dataloader. The worker processes are created
in :func:`dgl.distributed.initialize`.
**Note**: When running :func:`dgl.distributed.sample_neighbors` on :class:`~dgl.distributed.DistGraph`,
the sampler cannot run in Pytorch Dataloader with multiple worker processes. The main reason is that
Pytorch Dataloader creates new sampling worker processes in every epoch, which leads to creating and
the sampler cannot run in Pytorch DataLoader with multiple worker processes. The main reason is that
Pytorch DataLoader creates new sampling worker processes in every epoch, which leads to creating and
destroying :class:`~dgl.distributed.DistGraph` objects many times.
When using the low-level API, the sampling code is similar to single-process sampling. The only
......@@ -240,16 +240,16 @@ difference is that users need to use :func:`dgl.distributed.sample_neighbors` an
for batch in dataloader:
...
The same high-level sampling APIs (:class:`~dgl.dataloading.pytorch.NodeDataloader` and
:class:`~dgl.dataloading.pytorch.EdgeDataloader` ) work for both :class:`~dgl.DGLGraph`
and :class:`~dgl.distributed.DistGraph`. When using :class:`~dgl.dataloading.pytorch.NodeDataloader`
and :class:`~dgl.dataloading.pytorch.EdgeDataloader`, the distributed sampling code is exactly
the same as single-process sampling.
The high-level sampling APIs (:class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ) has distributed counterparts
(:class:`~dgl.dataloading.pytorch.DistNodeDataLoader` and
:class:`~dgl.dataloading.pytorch.DistEdgeDataLoader`). The code is exactly the
same as single-process sampling otherwise.
.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
......
......@@ -177,9 +177,9 @@ DGL提供了一个稀疏的Adagrad优化器 :class:`~dgl.distributed.SparseAdagr
DGL提供了两个级别的API,用于对节点和边进行采样以生成小批次训练数据(请参阅小批次训练的章节)。
底层API要求用户编写代码以明确定义如何对节点层进行采样(例如,使用 :func:`dgl.sampling.sample_neighbors` )。
高层采样API为节点分类和链接预测任务实现了一些流行的采样算法(例如
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.NodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader` )。
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` )。
分布式采样模块遵循相同的设计,也提供两个级别的采样API。对于底层的采样API,它为
:class:`~dgl.distributed.DistGraph` 上的分布式邻居采样提供了
......@@ -188,7 +188,7 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小
分布式数据加载器具有与PyTorch DataLoader相同的接口。其中的工作进程(worker)在 :func:`dgl.distributed.initialize` 中创建。
**Note**: 在 :class:`~dgl.distributed.DistGraph` 上运行 :func:`dgl.distributed.sample_neighbors` 时,
采样器无法在具有多个工作进程的PyTorch Dataloader中运行。主要原因是PyTorch Dataloader在每个训练周期都会创建新的采样工作进程,
采样器无法在具有多个工作进程的PyTorch DataLoader中运行。主要原因是PyTorch DataLoader在每个训练周期都会创建新的采样工作进程,
从而导致多次创建和删除 :class:`~dgl.distributed.DistGraph` 对象。
使用底层API时,采样代码类似于单进程采样。唯一的区别是用户需要使用
......@@ -214,18 +214,18 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小
for batch in dataloader:
...
:class:`~dgl.DGLGraph` 和 :class:`~dgl.distributed.DistGraph` 都可以使用相同的高级采样API(
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.NodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader`)。使用
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` 有分布式的版本
:class:`~dgl.dataloading.pytorch.DistNodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader` 时,分布式采样代码与单进程采样完全相同。
:class:`~dgl.dataloading.pytorch.DistEdgeDataLoader` 。使用
时分布式采样代码与单进程采样几乎完全相同。
.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
......
......@@ -132,8 +132,8 @@ DGL은 노드 임베딩들을 필요로 하는 변환 모델(transductive models
분산 샘플링
~~~~~~~~
DGL은 미니-배치를 생성하기 위해 노드 및 에지 샘플링을 하는 두 수준의 API를 제공한다 (미니-배치 학습 섹션 참조). Low-level API는 노드들의 레이어가 어떻게 샘플링될지를 명시적으로 정의하는 코드를 직접 작성해야한다 (예를 들면, :func:`dgl.sampling.sample_neighbors` 사용해서). High-level API는 노드 분류 및 링크 예측(예, :class:`~dgl.dataloading.pytorch.NodeDataloader` 와
:class:`~dgl.dataloading.pytorch.EdgeDataloader`) 에 사용되는 몇 가지 유명한 샘플링 알고리즘을 구현하고 있다.
DGL은 미니-배치를 생성하기 위해 노드 및 에지 샘플링을 하는 두 수준의 API를 제공한다 (미니-배치 학습 섹션 참조). Low-level API는 노드들의 레이어가 어떻게 샘플링될지를 명시적으로 정의하는 코드를 직접 작성해야한다 (예를 들면, :func:`dgl.sampling.sample_neighbors` 사용해서). High-level API는 노드 분류 및 링크 예측(예, :class:`~dgl.dataloading.pytorch.NodeDataLoader` 와
:class:`~dgl.dataloading.pytorch.EdgeDataLoader`) 에 사용되는 몇 가지 유명한 샘플링 알고리즘을 구현하고 있다.
분산 샘플링 모듈도 같은 디자인을 따르고 있고, 두 level의 샘플링 API를 제공한다. Low-level 샘플링 API의 경우, :class:`~dgl.distributed.DistGraph` 에 대한 분산 이웃 샘플링을 위해 :func:`~dgl.distributed.sample_neighbors` 가 있다. 또한, DGL은 분산 샘플링을 위해 분산 데이터 로더, :class:`~dgl.distributed.DistDataLoader` 를 제공한다. 분산 DataLoader는 PyTorch DataLoader와 같은 인터페이스를 갖는데, 다른 점은 사용자가 데이터 로더를 생성할 때 worker 프로세스의 개수를 지정할 수 없다는 점이다. Worker 프로세스들은 :func:`dgl.distributed.initialize` 에서 만들어진다.
......@@ -159,12 +159,12 @@ Low-level API를 사용할 때, 샘플링 코드는 단일 프로세스 샘플
for batch in dataloader:
...
동일한 high-level 샘플링 API들(:class:`~dgl.dataloading.pytorch.NodeDataloader` 와 :class:`~dgl.dataloading.pytorch.EdgeDataloader` )이 :class:`~dgl.DGLGraph` 와 :class:`~dgl.distributed.DistGraph` 에 대해서 동작한다. :class:`~dgl.dataloading.pytorch.NodeDataloader` 과 :class:`~dgl.dataloading.pytorch.EdgeDataloader` 를 사용할 때, 분산 샘플링 코드는 싱글-프로세스 샘플링 코드와 정확하게 같다.
동일한 high-level 샘플링 API들(:class:`~dgl.dataloading.pytorch.NodeDataLoader` 와 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` )이 :class:`~dgl.DGLGraph` 와 :class:`~dgl.distributed.DistGraph` 에 대해서 동작한다. :class:`~dgl.dataloading.pytorch.NodeDataLoader` 과 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 를 사용할 때, 분산 샘플링 코드는 싱글-프로세스 샘플링 코드와 정확하게 같다.
.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
......
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
USE_WRAPPER = True
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, sg, x):
h = x
for l, layer in enumerate(self.layers):
h = layer(sg, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph.ndata['train_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, train_idx, True)
graph.ndata['valid_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, valid_idx, True)
graph.ndata['test_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, test_idx, True)
model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
if USE_WRAPPER:
import dglnew
graph.create_formats_()
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)
num_partitions = 1000
sampler = dgl.dataloading.ClusterGCNSampler(
graph, num_partitions,
prefetch_node_feats=['feat', 'label', 'train_mask', 'valid_mask', 'test_mask'])
# DataLoader for generic dataloading with a graph, a set of indices (any indices, like
# partition IDs here), and a graph sampler.
# NodeDataLoader and EdgeDataLoader are simply special cases of DataLoader where the
# indices are guaranteed to be node and edge IDs.
dataloader = dgl.dataloading.DataLoader(
graph,
torch.arange(num_partitions),
sampler,
device='cuda',
batch_size=100,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=8,
persistent_workers=True,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label'][:, 0]
m = sg.ndata['train_mask']
y_hat = model(sg, x)
loss = F.cross_entropy(y_hat[m], y[m])
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat[m], y[m])
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
from . import graph
from . import storages
from .graph import *
from .other_feature import *
from .wrapper import *
class GraphStorage(object):
def get_node_storage(self, key, ntype=None):
pass
def get_edge_storage(self, key, etype=None):
pass
# Required for checking whether a single dict is allowed for ndata and edata.
@property
def ntypes(self):
pass
@property
def canonical_etypes(self):
pass
def etypes(self):
return [etype[1] for etype in self.canonical_etypes]
def sample_neighbors(self, seed_nodes, fanout, edge_dir='in', prob=None,
exclude_edges=None, replace=False, output_device=None):
"""Return a DGLGraph which is a subgraph induced by sampling neighboring edges of
the given nodes.
See ``dgl.sampling.sample_neighbors`` for detailed semantics.
Parameters
----------
seed_nodes : Tensor or dict[str, Tensor]
Node IDs to sample neighbors from.
This argument can take a single ID tensor or a dictionary of node types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
fanout : int or dict[etype, int]
The number of edges to be sampled for each node on each edge type.
This argument can take a single int or a dictionary of edge types and ints.
If a single int is given, DGL will sample this number of edges for each node for
every edge type.
If -1 is given for a single edge type, all the neighboring edges with that edge
type will be selected.
prob : str, optional
Feature name used as the (unnormalized) probabilities associated with each
neighboring edge of a node. The feature must have only one element for each
edge.
The features must be non-negative floats, and the sum of the features of
inbound/outbound edges for every node must be positive (though they don't have
to sum up to one). Otherwise, the result will be undefined.
If :attr:`prob` is not None, GPU sampling is not supported.
exclude_edges: tensor or dict
Edge IDs to exclude during sampling neighbors for the seed nodes.
This argument can take a single ID tensor or a dictionary of edge types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
replace : bool, optional
If True, sample with replacement.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring
edges. The induced edge IDs will be in ``edata[dgl.EID]``.
"""
pass
# Required in Cluster-GCN
def subgraph(self, nodes, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given nodes.
This has the same semantics as ``dgl.node_subgraph``.
Parameters
----------
nodes : nodes or dict[str, nodes]
The nodes to form the subgraph. The allowed nodes formats are:
* Int Tensor: Each element is a node ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is a node ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
node :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being node types
and values being the node IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass
# Required in Link Prediction
def edge_subgraph(self, edges, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given edges.
This has the same semantics as ``dgl.edge_subgraph``.
Parameters
----------
edges : edges or dict[(str, str, str), edges]
The edges to form the subgraph. The allowed edges formats are:
* Int Tensor: Each element is an edge ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is an edge ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
edge :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being edge types
and values being the edge IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass
# Required in Link Prediction negative sampler
def find_edges(self, edges, etype=None, output_device=None):
"""Return the source and destination node IDs given the edge IDs within the given edge type.
"""
pass
# Required in Link Prediction negative sampler
def num_nodes(self, ntype):
"""Return the number of nodes for the given node type."""
pass
def global_uniform_negative_sampling(self, num_samples, exclude_self_loops=True,
replace=False, etype=None):
"""Per source negative sampling as in ``dgl.dataloading.GlobalUniform``"""
from collections import Mapping
from dgl.storages import wrap_storage
from dgl.utils import recursive_apply
# A GraphStorage class where ndata and edata can be any FeatureStorage but
# otherwise the same as the wrapped DGLGraph.
class OtherFeatureGraphStorage(object):
def __init__(self, g, ndata=None, edata=None):
self.g = g
self._ndata = recursive_apply(ndata, wrap_storage) if ndata is not None else {}
self._edata = recursive_apply(edata, wrap_storage) if edata is not None else {}
for k, v in self._ndata.items():
if not isinstance(v, Mapping):
assert len(self.g.ntypes) == 1
self._ndata[k] = {self.g.ntypes[0]: v}
for k, v in self._edata.items():
if not isinstance(v, Mapping):
assert len(self.g.canonical_etypes) == 1
self._edata[k] = {self.g.canonical_etypes[0]: v}
def get_node_storage(self, key, ntype=None):
if ntype is None:
ntype = self.g.ntypes[0]
return self._ndata[key][ntype]
def get_edge_storage(self, key, etype=None):
if etype is None:
etype = self.g.canonical_etypes[0]
return self._edata[key][etype]
def __getattr__(self, key):
# I wrote it in this way because I'm too lazy to write "def sample_neighbors"
# or stuff like that.
if key in ['ntypes', 'etypes', 'canonical_etypes', 'sample_neighbors',
'subgraph', 'edge_subgraph', 'find_edges', 'num_nodes']:
# Delegate to the wrapped DGLGraph instance.
return getattr(self.g, key)
else:
return super().__getattr__(key)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.distributed as dist
import torch.distributed.optim
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
USE_WRAPPER = False
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
def train(rank, world_size, graph, num_classes, split_idx):
torch.cuda.set_device(rank)
dist.init_process_group('nccl', 'tcp://127.0.0.1:12347', world_size=world_size, rank=rank)
model = SAGE(graph.ndata['feat'].shape[1], 256, num_classes).cuda()
model = nn.parallel.DistributedDataParallel(model, device_ids=[rank], output_device=rank)
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
if USE_WRAPPER:
import dglnew
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu', prefetch_node_feats=['feat'],
prefetch_labels=['label'])
dataloader = dgl.dataloading.NodeDataLoader(
graph,
train_idx,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=4,
persistent_workers=True,
use_ddp=True,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
y = blocks[-1].dstdata['label'][:, 0]
y_hat = model(blocks, x)
loss = F.cross_entropy(y_hat, y)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat, y)
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
if rank == 0:
print(tt - t0)
durations.append(tt - t0)
if rank == 0:
print(np.mean(durations[4:]), np.std(durations[4:]))
if __name__ == '__main__':
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
graph.create_formats_()
split_idx = dataset.get_idx_split()
num_classes = dataset.num_classes
n_procs = 4
# Tested with mp.spawn and fork. Both worked and got 4s per epoch with 4 GPUs
# and 3.86s per epoch with 8 GPUs on p2.8x, compared to 5.2s from official examples.
#import torch.multiprocessing as mp
#mp.spawn(train, args=(n_procs, graph, num_classes, split_idx), nprocs=n_procs)
import dgl.multiprocessing as mp
procs = []
for i in range(n_procs):
p = mp.Process(target=train, args=(i, n_procs, graph, num_classes, split_idx))
p.start()
procs.append(p)
for p in procs:
p.join()
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
# OGB must follow DGL if both DGL and PyG are installed. Otherwise DataLoader will hang.
# (This is a long-standing issue)
from ogb.nodeproppred import DglNodePropPredDataset
import dglnew
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
# This is an example of using feature storage other than tensors
feat_np = graph.ndata['feat'].numpy()
feat = np.memmap('feat.npy', mode='w+', shape=feat_np.shape, dtype='float32')
print(feat.shape)
feat[:] = feat_np
model = SAGE(feat.shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
graph.create_formats_()
# Because NumpyStorage is registered with memmap, one can directly add numpy memmaps
graph = dglnew.graph.OtherFeatureGraphStorage(graph, ndata={'feat': feat, 'label': labels})
#graph = dglnew.graph.OtherFeatureGraphStorage(graph,
# ndata={'feat': dgl.storages.NumpyStorage(feat), 'label': labels})
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu', prefetch_node_feats=['feat'],
prefetch_labels=['label'])
dataloader = dgl.dataloading.NodeDataLoader(
graph,
train_idx,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=4,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
y = blocks[-1].dstdata['label'][:, 0]
y_hat = model(blocks, x)
loss = F.cross_entropy(y_hat, y)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat, y)
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
# OGB must follow DGL if both DGL and PyG are installed. Otherwise DataLoader will hang.
# (This is a long-standing issue)
from ogb.nodeproppred import DglNodePropPredDataset
USE_WRAPPER = True
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, pair_graph, neg_pair_graph, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
with pair_graph.local_scope(), neg_pair_graph.local_scope():
pair_graph.ndata['h'] = neg_pair_graph.ndata['h'] = h
pair_graph.apply_edges(dgl.function.u_dot_v('h', 'h', 's'))
neg_pair_graph.apply_edges(dgl.function.u_dot_v('h', 'h', 's'))
return pair_graph.edata['s'], neg_pair_graph.edata['s']
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
num_edges = graph.num_edges()
train_eids = torch.arange(num_edges)
if USE_WRAPPER:
import dglnew
graph.create_formats_()
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu', prefetch_node_feats=['feat'],
prefetch_labels=['label'])
dataloader = dgl.dataloading.EdgeDataLoader(
graph,
train_eids,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=8,
persistent_workers=True,
use_prefetch_thread=True, # TBD: could probably remove this argument
exclude='reverse_id',
reverse_eids=torch.arange(num_edges) ^ 1,
negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, pair_graph, neg_pair_graph, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
pos_score, neg_score = model(pair_graph, neg_pair_graph, blocks, x)
pos_label = torch.ones_like(pos_score)
neg_label = torch.zeros_like(neg_score)
score = torch.cat([pos_score, neg_score])
labels = torch.cat([pos_label, neg_label])
loss = F.binary_cross_entropy_with_logits(score, labels)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.auroc(score, labels.long())
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
t0 = time.time()
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
USE_WRAPPER = True
class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
if USE_WRAPPER:
import dglnew
graph.create_formats_()
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu', prefetch_node_feats=['feat'],
prefetch_labels=['label'])
dataloader = dgl.dataloading.NodeDataLoader(
graph,
train_idx,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=16,
persistent_workers=True,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
y = blocks[-1].dstdata['label'][:, 0]
y_hat = model(blocks, x)
loss = F.cross_entropy(y_hat, y)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat, y)
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.function as fn
import dgl.nn as dglnn
from dgl.utils import recursive_apply
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset
import tqdm
USE_WRAPPER = True
class HeteroGAT(nn.Module):
def __init__(self, etypes, in_feats, n_hidden, n_classes, n_heads=4):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.HeteroGraphConv({
etype: dglnn.GATConv(in_feats, n_hidden // n_heads, n_heads)
for etype in etypes}))
self.layers.append(dglnn.HeteroGraphConv({
etype: dglnn.GATConv(n_hidden, n_hidden // n_heads, n_heads)
for etype in etypes}))
self.layers.append(dglnn.HeteroGraphConv({
etype: dglnn.GATConv(n_hidden, n_hidden // n_heads, n_heads)
for etype in etypes}))
self.dropout = nn.Dropout(0.5)
self.linear = nn.Linear(n_hidden, n_classes) # Should be HeteroLinear
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
# One thing is that h might return tensors with zero rows if the number of dst nodes
# of one node type is 0. x.view(x.shape[0], -1) wouldn't work in this case.
h = recursive_apply(h, lambda x: x.view(x.shape[0], x.shape[1] * x.shape[2]))
if l != len(self.layers) - 1:
h = recursive_apply(h, F.relu)
h = recursive_apply(h, self.dropout)
return self.linear(h['paper'])
dataset = DglNodePropPredDataset('ogbn-mag')
graph, labels = dataset[0]
graph.ndata['label'] = labels
# Preprocess: add reverse edges in "cites" relation, and add reverse edge types for the
# rest.
graph = dgl.AddReverse()(graph)
# Preprocess: precompute the author, topic, and institution features
graph.update_all(fn.copy_u('feat', 'm'), fn.mean('m', 'feat'), etype='rev_writes')
graph.update_all(fn.copy_u('feat', 'm'), fn.mean('m', 'feat'), etype='has_topic')
graph.update_all(fn.copy_u('feat', 'm'), fn.mean('m', 'feat'), etype='affiliated_with')
graph.edges['cites'].data['weight'] = torch.ones(graph.num_edges('cites')) # dummy edge weights
model = HeteroGAT(graph.etypes, graph.ndata['feat']['paper'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
if USE_WRAPPER:
import dglnew
graph.create_formats_()
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
sampler = dgl.dataloading.NeighborSampler(
[5, 5, 5], output_device='cpu',
prefetch_node_feats={k: ['feat'] for k in graph.ntypes},
prefetch_labels={'paper': ['label']},
prefetch_edge_feats={'cites': ['weight']})
dataloader = dgl.dataloading.NodeDataLoader(
graph,
train_idx,
sampler,
device='cuda',
batch_size=1000,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=8,
persistent_workers=True,
use_prefetch_thread=True) # TBD: could probably remove this argument
durations = []
for _ in range(10):
t0 = time.time()
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
x = blocks[0].srcdata['feat']
y = blocks[-1].dstdata['label']['paper'][:, 0]
assert y.min() >= 0 and y.max() < dataset.num_classes
y_hat = model(blocks, x)
loss = F.cross_entropy(y_hat, y)
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat, y)
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
......@@ -69,7 +69,7 @@ class SAGE(nn.Module):
y = th.zeros(g.number_of_nodes(), self.n_hidden if l != len(self.layers) - 1 else self.n_classes)
sampler = dgl.dataloading.MultiLayerNeighborSampler([None])
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DistNodeDataLoader(
g,
th.arange(g.number_of_nodes()),
sampler,
......
......@@ -366,7 +366,7 @@ def run(args, device, data):
val_fanouts = [int(fanout) for fanout in args.validation_fanout.split(',')]
sampler = dgl.dataloading.MultiLayerNeighborSampler(fanouts)
dataloader = dgl.dataloading.NodeDataLoader(
dataloader = dgl.dataloading.DistNodeDataLoader(
g,
{'paper': train_nid},
sampler,
......@@ -375,7 +375,7 @@ def run(args, device, data):
drop_last=False)
valid_sampler = dgl.dataloading.MultiLayerNeighborSampler(val_fanouts)
valid_dataloader = dgl.dataloading.NodeDataLoader(
valid_dataloader = dgl.dataloading.DistNodeDataLoader(
g,
{'paper': val_nid},
valid_sampler,
......@@ -384,7 +384,7 @@ def run(args, device, data):
drop_last=False)
test_sampler = dgl.dataloading.MultiLayerNeighborSampler(val_fanouts)
test_dataloader = dgl.dataloading.NodeDataLoader(
test_dataloader = dgl.dataloading.DistNodeDataLoader(
g,
{'paper': test_nid},
test_sampler,
......
......@@ -287,6 +287,9 @@ class TemporalEdgeDataLoader(dgl.dataloading.EdgeDataLoader):
if dataloader_kwargs.get('num_workers', 0) > 0:
g.create_formats_()
def __iter__(self):
return iter(self.dataloader)
# ====== Fast Mode ======
# Part of code in reservoir sampling comes from PyG library
......
......@@ -292,7 +292,7 @@ if __name__ == "__main__":
if i < args.epochs-1 and args.fast_mode:
sampler.reset()
print(log_content[0], log_content[1], log_content[2])
except:
except KeyboardInterrupt:
traceback.print_exc()
error_content = "Training Interreputed!"
f.writelines(error_content)
......
......@@ -21,9 +21,11 @@ from . import container
from . import distributed
from . import random
from . import sampling
from . import storages
from . import dataloading
from . import ops
from . import cuda
from . import _dataloading # legacy dataloading modules
from ._ffi.runtime_ctypes import TypeCode
from ._ffi.function import register_func, get_global_func, list_global_func_names, extract_ext_funcs
......
"""The ``dgl.dataloading`` package contains:
* Data loader classes for iterating over a set of nodes or edges in a graph and generates
computation dependency via neighborhood sampling methods.
* Various sampler classes that perform neighborhood sampling for multi-layer GNNs.
* Negative samplers for link prediction.
For a holistic explanation on how different components work together.
Read the user guide :ref:`guide-minibatch`.
.. note::
This package is experimental and the interfaces may be subject
to changes in future releases. It currently only has implementations in PyTorch.
"""
from .neighbor import *
from .dataloader import *
from .cluster_gcn import *
from .shadow import *
from . import negative_sampler
from .async_transferer import AsyncTransferer
from .. import backend as F
if F.get_preferred_backend() == 'pytorch':
from .pytorch import *
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment