Unverified Commit f13b9b62 authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[Doc] Scan the API docs and make many changes (#2080)



* WIP: api

* dgl.sampling, dgl.data

* dgl.sampling; dgl.dataloading

* sampling packages

* convert

* subgraph

* deprecate

* subgraph APIs

* All docstrings for convert/subgraph/transform

* almost all funcs under dgl namespace

* WIP: DGLGraph

* done graph query

* message passing functions

* lint

* fix merge error

* fix test

* lint

* fix
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
parent 35e25914
"""
.. _model-sampling:
NodeFlow and Sampling
=======================================
**Author**: Ziyue Huang, Da Zheng, Quan Gan, Jinjing Zhou, Zheng Zhang
"""
################################################################################################
#
# Graph convolutional network
# ~~~
#
# In an :math:`L`-layer graph convolution network (GCN), given a graph
# :math:`G=(V, E)`, represented as an adjacency matrix :math:`A`, with
# node features :math:`H^{(0)} = X \in \mathbb{R}^{|V| \times d}`, the
# hidden feature of a node :math:`v` in :math:`(l+1)`-th layer
# :math:`h_v^{(l+1)}` depends on the features of all its neighbors in the
# previous layer :math:`h_u^{(l)}`:
#
# .. math::
#
#
# z_v^{(l+1)} = \sum_{u \in \mathcal{N}(v)} \tilde{A}_{uv} h_u^{(l)} \qquad h_v^{(l+1)} = \sigma ( z_v^{(l+1)} W^{(l)})
#
# where :math:`\mathcal{N}(v)` is the neighborhood of :math:`v`,
# :math:`\tilde{A}` could be any normalized version of :math:`A` such as
# :math:`D^{-1} A` in Kipf et al., :math:`\sigma(\cdot)` is an activation
# function, and :math:`W^{(l)}` is a trainable parameter of the
# :math:`l`-th layer.
#
# In the node classification task you minimize the following loss:
#
# .. math::
#
#
# \frac{1}{\vert \mathcal{V}_\mathcal{L} \vert} \sum_{v \in \mathcal{V}_\mathcal{L}} f(y_v, z_v^{(L)})
#
# where :math:`y_v` is the label of :math:`v`, and :math:`f(\cdot, \cdot)`
# is a loss function, e.g., cross entropy loss.
#
# While training GCN on the full graph, each node aggregates the hidden
# features of its neighbors to compute its hidden feature in the next
# layer.
#
# In this tutorial, you run GCN on the Reddit dataset constructed by `Hamilton et
# al. <https://arxiv.org/abs/1706.02216>`__, wherein the nodes are posts
# and edges are established if two nodes are commented by a same user. The
# task is to predict the category that a post belongs to. This graph has
# 233,000 nodes, 114.6 million edges and 41 categories. First load the Reddit graph.
#
import numpy as np
import dgl
import dgl.function as fn
from dgl import DGLGraph
from dgl.data import RedditDataset
import mxnet as mx
from mxnet import gluon
# Load MXNet as backend
dgl.load_backend('mxnet')
# load dataset
data = RedditDataset(self_loop=True)
train_nid = mx.nd.array(np.nonzero(data.train_mask)[0]).astype(np.int64)
features = mx.nd.array(data.features)
in_feats = features.shape[1]
labels = mx.nd.array(data.labels)
n_classes = data.num_labels
# construct DGLGraph and prepare related data
g = DGLGraph(data.graph, readonly=True)
g.ndata['features'] = features
################################################################################################
# Here you define the node UDF, which has a fully-connected layer:
#
class NodeUpdate(gluon.Block):
def __init__(self, in_feats, out_feats, activation=None):
super(NodeUpdate, self).__init__()
self.dense = gluon.nn.Dense(out_feats, in_units=in_feats)
self.activation = activation
def forward(self, node):
h = node.data['h']
h = self.dense(h)
if self.activation:
h = self.activation(h)
return {'activation': h}
################################################################################################
# In DGL, you implement GCN on the full graph with ``update_all`` in ``DGLGraph``.
# The following code performs two-layer GCN on the Reddit graph.
#
# number of GCN layers
L = 2
# number of hidden units of a fully connected layer
n_hidden = 64
layers = [NodeUpdate(g.ndata['features'].shape[1], n_hidden, mx.nd.relu),
NodeUpdate(n_hidden, n_hidden, mx.nd.relu)]
for layer in layers:
layer.initialize()
h = g.ndata['features']
for i in range(L):
g.ndata['h'] = h
g.update_all(message_func=fn.copy_src(src='h', out='m'),
reduce_func=fn.sum(msg='m', out='h'),
apply_node_func=lambda node: {'h': layers[i](node)['activation']})
h = g.ndata.pop('h')
##############################################################################
# NodeFlow
# ~~~~~~~~~~~~~~~~~
#
# As the graph scales up to billions of nodes or edges, training on the
# full graph would no longer be efficient or even feasible.
#
# Mini-batch training allows you to control the computation and memory
# usage within some budget. The training loss for each iteration is
#
# .. math::
#
# \frac{1}{\vert \tilde{\mathcal{V}}_\mathcal{L} \vert} \sum_{v \in \tilde{\mathcal{V}}_\mathcal{L}} f(y_v, z_v^{(L)})
#
# where :math:`\tilde{\mathcal{V}}_\mathcal{L}` is a subset sampled from
# the total labeled nodes :math:`\mathcal{V}_\mathcal{L}` uniformly at
# random.
#
# Stemming from the labeled nodes :math:`\tilde{\mathcal{V}}_\mathcal{L}`
# in a mini-batch and tracing back to the input forms a computational
# dependency graph (a directed acyclic graph [DAG]), which
# captures the computation flow of :math:`Z^{(L)}`.
#
# In the example below, a mini-batch to compute the hidden features of
# node D in layer 2 requires hidden features of A, B, E, G in layer 1,
# which in turn requires hidden features of C, D, F in layer 0.
#
# |image0|
#
# For that purpose, you define ``NodeFlow`` to represent this computation
# flow.
#
# ``NodeFlow`` is a type of layered graph, where nodes are organized in
# :math:`L + 1` sequential *layers*, and edges only exist between adjacent
# layers, forming *blocks*. You construct ``NodeFlow`` backwards, starting
# from the last layer with all the nodes whose hidden features are
# requested. The set of nodes the next layer depends on forms the previous
# layer. An edge connects a node in the previous layer to another in the
# next layer if the latter depends on the former. Repeat such process
# until all :math:`L + 1` layers are constructed. The feature of nodes in
# each layer, and that of edges in each block, are stored as separate
# tensors.
#
# .. raw:: html
#
# ``NodeFlow`` provides ``block_compute`` for per-block computation, which
# triggers computation and data propogation from the lower layer to the
# next upper layer.
#
##############################################################################
# Neighbor sampling
# ~~~~~~~~~~~~~~~~~
#
# Real-world graphs often have nodes with large degree, meaning that a
# moderately deep (e.g., three layers) GCN would often depend on input features
# of the entire graph, even if the computation only depends on outputs of
# a few nodes, hence its cost-ineffectiveness.
#
# Sampling methods mitigate this computational problem by reducing the
# receptive field effectively. Fig-c above shows one such example.
#
# Instead of using all the :math:`L`-hop neighbors of a node :math:`v`,
# `Hamilton et al. <https://arxiv.org/abs/1706.02216>`__ propose *neighbor
# sampling*, which randomly samples a few neighbors
# :math:`\hat{\mathcal{N}}^{(l)}(v)` to estimate the aggregation
# :math:`z_v^{(l+1)}` of its total neighbors :math:`\mathcal{N}(v)` in
# :math:`l`-th GCN layer, by an unbiased estimator
# :math:`\hat{z}_v^{(l+1)}`
#
# .. math::
#
#
# \hat{z}_v^{(l+1)} = \frac{\vert \mathcal{N}(v) \vert }{\vert \hat{\mathcal{N}}^{(l)}(v) \vert} \sum_{u \in \hat{\mathcal{N}}^{(l)}(v)} \tilde{A}_{uv} \hat{h}_u^{(l)} \qquad
# \hat{h}_v^{(l+1)} = \sigma ( \hat{z}_v^{(l+1)} W^{(l)} )
#
# Let :math:`D^{(l)}` be the number of neighbors to be sampled for each
# node at the :math:`l`-th layer, then the receptive field size of each
# node can be controlled under :math:`\prod_{i=0}^{L-1} D^{(l)}` by
# *neighbor sampling*.
#
##############################################################################
# You then implement *neighbor sampling* by ``NodeFlow``:
#
class GCNSampling(gluon.Block):
def __init__(self,
in_feats,
n_hidden,
n_classes,
n_layers,
activation,
dropout,
**kwargs):
super(GCNSampling, self).__init__(**kwargs)
self.dropout = dropout
self.n_layers = n_layers
with self.name_scope():
self.layers = gluon.nn.Sequential()
# input layer
self.layers.add(NodeUpdate(in_feats, n_hidden, activation))
# hidden layers
for i in range(1, n_layers-1):
self.layers.add(NodeUpdate(n_hidden, n_hidden, activation))
# output layer
self.layers.add(NodeUpdate(n_hidden, n_classes))
def forward(self, nf):
nf.layers[0].data['activation'] = nf.layers[0].data['features']
for i, layer in enumerate(self.layers):
h = nf.layers[i].data.pop('activation')
if self.dropout:
h = mx.nd.Dropout(h, p=self.dropout)
nf.layers[i].data['h'] = h
# block_compute() computes the feature of layer i given layer
# i-1, with the given message, reduce, and apply functions.
# Here, you essentially aggregate the neighbor node features in
# the previous layer, and update it with the `layer` function.
nf.block_compute(i,
fn.copy_src(src='h', out='m'),
lambda node : {'h': node.mailbox['m'].mean(axis=1)},
layer)
h = nf.layers[-1].data.pop('activation')
return h
##############################################################################
# DGL provides ``NeighborSampler`` to construct the ``NodeFlow`` for a
# mini-batch according to the computation logic of neighbor sampling.
# ``NeighborSampler``
# returns an iterator that generates a ``NodeFlow`` each time. This function
# has many options to give users opportunities to customize the behavior
# of the neighbor sampler, including the number of neighbors to sample or
# the number of hops to sample, for example. Please see `its API
# document <https://doc.dgl.ai/api/python/sampler.html>`__ for more
# details.
#
# dropout probability
dropout = 0.2
# batch size
batch_size = 1000
# number of neighbors to sample
num_neighbors = 4
# number of epochs
num_epochs = 1
# initialize the model and cross entropy loss
model = GCNSampling(in_feats, n_hidden, n_classes, L,
mx.nd.relu, dropout, prefix='GCN')
model.initialize()
loss_fcn = gluon.loss.SoftmaxCELoss()
# use adam optimizer
trainer = gluon.Trainer(model.collect_params(), 'adam',
{'learning_rate': 0.03, 'wd': 0})
for epoch in range(num_epochs):
i = 0
for nf in dgl.contrib.sampling.NeighborSampler(g, batch_size,
num_neighbors,
neighbor_type='in',
shuffle=True,
num_hops=L,
seed_nodes=train_nid):
# When `NodeFlow` is generated from `NeighborSampler`, it only contains
# the topology structure, on which there is no data attached.
# Users need to call `copy_from_parent` to copy specific data,
# such as input node features, from the original graph.
nf.copy_from_parent()
with mx.autograd.record():
# forward
pred = model(nf)
batch_nids = nf.layer_parent_nid(-1).astype('int64')
batch_labels = labels[batch_nids]
# cross entropy loss
loss = loss_fcn(pred, batch_labels)
loss = loss.sum() / len(batch_nids)
# backward
loss.backward()
# optimization
trainer.step(batch_size=1)
print("Epoch[{}]: loss {}".format(epoch, loss.asscalar()))
i += 1
# You only train the model with 32 mini-batches just for demonstration.
if i >= 32:
break
##############################################################################
# Control variate
# ~~~~~~~~~~~~~~~
#
# The unbiased estimator :math:`\hat{Z}^{(\cdot)}` used in *neighbor
# sampling* might suffer from high variance, so it still requires a
# relatively large number of neighbors, e.g. \ :math:`D^{(0)}=25` and
# :math:`D^{(1)}=10` in `Hamilton et
# al. <https://arxiv.org/abs/1706.02216>`__. With *control variate*, a
# standard variance reduction technique widely used in Monte Carlo
# methods, 2 neighbors for a node seems sufficient.
#
# *Control variate* method works as follows: Given a random variable
# :math:`X` and you wish to estimate its expectation
# :math:`\mathbb{E} [X] = \theta`, it finds another random variable
# :math:`Y` which is highly correlated with :math:`X` and whose
# expectation :math:`\mathbb{E} [Y]` can be easily computed. The *control
# variate* estimator :math:`\tilde{X}` is
#
# .. math::
#
# \tilde{X} = X - Y + \mathbb{E} [Y] \qquad \mathbb{VAR} [\tilde{X}] = \mathbb{VAR} [X] + \mathbb{VAR} [Y] - 2 \cdot \mathbb{COV} [X, Y]
#
# If :math:`\mathbb{VAR} [Y] - 2\mathbb{COV} [X, Y] < 0`, then
# :math:`\mathbb{VAR} [\tilde{X}] < \mathbb{VAR} [X]`.
#
# `Chen et al. <https://arxiv.org/abs/1710.10568>`__ proposed a *control
# variate* based estimator used in GCN training, by using history
# :math:`\bar{H}^{(l)}` of the nodes which are not sampled, the modified
# estimator :math:`\hat{z}_v^{(l+1)}` is
#
# .. math::
#
#
# \hat{z}_v^{(l+1)} = \frac{\vert \mathcal{N}(v) \vert }{\vert \hat{\mathcal{N}}^{(l)}(v) \vert} \sum_{u \in \hat{\mathcal{N}}^{(l)}(v)} \tilde{A}_{uv} ( \hat{h}_u^{(l)} - \bar{h}_u^{(l)} ) + \sum_{u \in \mathcal{N}(v)} \tilde{A}_{uv} \bar{h}_u^{(l)} \\
# \hat{h}_v^{(l+1)} = \sigma ( \hat{z}_v^{(l+1)} W^{(l)} )
#
# This method can also be *conceptually* implemented in DGL as shown
# here.
#
have_large_memory = False
# The control-variate sampling code below needs to run on a large-memory
# machine for the Reddit graph.
if have_large_memory:
g.ndata['h_0'] = features
for i in range(L):
g.ndata['h_{}'.format(i+1)] = mx.nd.zeros((features.shape[0], n_hidden))
# With control-variate sampling, you only need to sample two neighbors to train GCN.
for nf in dgl.contrib.sampling.NeighborSampler(g, batch_size, expand_factor=2,
neighbor_type='in', num_hops=L,
seed_nodes=train_nid):
for i in range(nf.num_blocks):
# aggregate history on the original graph
g.pull(nf.layer_parent_nid(i+1),
fn.copy_src(src='h_{}'.format(i), out='m'),
lambda node: {'agg_h_{}'.format(i): node.mailbox['m'].mean(axis=1)})
nf.copy_from_parent()
h = nf.layers[0].data['features']
for i in range(nf.num_blocks):
prev_h = nf.layers[i].data['h_{}'.format(i)]
# compute delta_h, the difference of the current activation and the history
nf.layers[i].data['delta_h'] = h - prev_h
# refresh the old history
nf.layers[i].data['h_{}'.format(i)] = h.detach()
# aggregate the delta_h
nf.block_compute(i,
fn.copy_src(src='delta_h', out='m'),
lambda node: {'delta_h': node.data['m'].mean(axis=1)})
delta_h = nf.layers[i + 1].data['delta_h']
agg_h = nf.layers[i + 1].data['agg_h_{}'.format(i)]
# control variate estimator
nf.layers[i + 1].data['h'] = delta_h + agg_h
nf.apply_layer(i + 1, lambda node : {'h' : layer(node.data['h'])})
h = nf.layers[i + 1].data['h']
# update history
nf.copy_to_parent()
##############################################################################
# You can see full example here, `MXNet
# code <https://github.com/dmlc/dgl/blob/master/examples/mxnet/sampling/>`__
# and `PyTorch
# code <https://github.com/dmlc/dgl/tree/master/examples/pytorch/sampling>`__.
#
# Below shows the performance of graph convolution network and GraphSage
# with neighbor sampling and control variate sampling on the Reddit
# dataset. Our GraphSage with control variate sampling, when sampling one
# neighbor, can achieve over 96 percent test accuracy. |image1|
#
# More APIs
# ~~~~~~~~~
#
# In fact, ``block_compute`` is one of the APIs that comes with
# ``NodeFlow``, which provides flexibility to research new ideas. The
# computation flow underlying a DAG can be executed in one sweep, by
# calling ``prop_flows``.
#
# ``prop_flows`` accepts a list of UDFs. The code below defines node update UDFs
# for each layer and computes a simplified version of GCN with neighbor sampling.
#
apply_node_funcs = [
lambda node : {'h' : layers[0](node)['activation']},
lambda node : {'h' : layers[1](node)['activation']},
]
for nf in dgl.contrib.sampling.NeighborSampler(g, batch_size, num_neighbors,
neighbor_type='in', num_hops=L,
seed_nodes=train_nid):
nf.copy_from_parent()
nf.layers[0].data['h'] = nf.layers[0].data['features']
nf.prop_flow(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'), apply_node_funcs)
##############################################################################
# Internally, ``prop_flow`` triggers the computation by fusing together
# all the block computations, from the input to the top. The main
# advantages of this API are 1) simplicity, 2) allowing more system-level
# optimization in the future.
#
# .. |image0| image:: https://data.dgl.ai/tutorial/sampling/NodeFlow.png
# .. |image1| image:: https://data.dgl.ai/tutorial/sampling/sampling_result.png
#
"""
.. _model-graph-store:
Large-Scale Training of Graph Neural Networks
=============================================
**Author**: Da Zheng, Chao Ma, Zheng Zhang
"""
################################################################################################
#
# In real-world tasks, many graphs are very large. For example, a recent
# snapshot of the friendship network of Facebook contains 800 million
# nodes and over 100 billion links. We are facing challenges on
# large-scale training of graph neural networks.
#
# To accelerate training on a giant graph, DGL provides two additional
# components: sampler and graph store.
#
# - A sampler constructs small subgraphs (``NodeFlow``) from a given
# (giant) graph. The sampler can run on a local machine as well as on
# remote machines. Also, DGL can launch multiple parallel samplers
# across a set of machines.
#
# - The graph store contains graph embeddings of a giant graph, as well
# as the graph structure. So far, we provide a shared-memory graph
# store to support multi-processing training, which is important for
# training on multiple GPUs and on non-uniform memory access (NUMA)
# machines. The shared-memory graph store has a similar interface to
# ``DGLGraph`` for programming. DGL will also support a distributed
# graph store that can store graph embeddings across machines in the
# future release.
#
# The figure below shows the interaction of the trainer with the samplers
# and the graph store. The trainer takes subgraphs (``NodeFlow``) from the
# sampler and fetches graph embeddings from the graph store before
# training. The trainer can push new graph embeddings to the graph store
# afterward.
#
# |image0|
#
# In this tutorial, we use control-variate sampling to demonstrate how to
# use these three DGL components, extending `the original code of
# control-variate
# sampling <https://doc.dgl.ai/tutorials/models/5_giant_graph/1_sampling_mx.html#sphx-glr-tutorials-models-5-giant-graph-1-sampling-mx-py>`__.
# Because the graph store has a similar API to ``DGLGraph``, the code is
# similar. The tutorial will mainly focus on the difference.
#
# Graph Store
# -----------
#
# The graph store has two parts: the server and the client. We need to run
# the graph store server as a daemon before training. We provide a script
# ``run_store_server.py`` `(link) <https://github.com/dmlc/dgl/blob/master/examples/mxnet/sampling/run_store_server.py>`__
# that runs the graph store server and loads graph data. For example, the
# following command runs a graph store server that loads the reddit
# dataset and is configured to run with four trainers.
#
# ::
#
# python3 run_store_server.py --dataset reddit --num-workers 4
#
# The trainer uses the graph store client to access data in the graph
# store from the trainer process. A user only needs to write code in the
# trainer. We first create the graph store client that connects with the
# server. We specify ``store_type`` as “shared_memory” to connect with the
# shared-memory graph store server.
#
# .. code:: python
#
# g = dgl.contrib.graph_store.create_graph_from_store("reddit", store_type="shared_mem")
#
# The `sampling
# tutorial <https://doc.dgl.ai/tutorials/models/5_giant_graph/1_sampling_mx.html#sphx-glr-tutorials-models-5-giant-graph-1-sampling-mx-py>`__
# shows the detail of sampling methods and how they are used to train
# graph neural networks such as graph convolution network. As a recap, the
# graph convolution model performs the following computation in each
# layer.
#
# .. math::
#
#
# z_v^{(l+1)} = \sum_{u \in \mathcal{N}^{(l)}(v)} \tilde{A}_{uv} h_u^{(l)} \qquad
# h_v^{(l+1)} = \sigma ( z_v^{(l+1)} W^{(l)} )
#
# `Control variate sampling <https://arxiv.org/abs/1710.10568>`__
# approximates :math:`z_v^{(l+1)}` as follows:
#
# .. math::
#
#
# \hat{z}_v^{(l+1)} = \frac{\vert \mathcal{N}(v) \vert }{\vert \hat{\mathcal{N}}^{(l)}(v) \vert} \sum_{u \in \hat{\mathcal{N}}^{(l)}(v)} \tilde{A}_{uv} ( \hat{h}_u^{(l)} - \bar{h}_u^{(l)} ) + \sum_{u \in \mathcal{N}(v)} \tilde{A}_{uv} \bar{h}_u^{(l)} \\
# \hat{h}_v^{(l+1)} = \sigma ( \hat{z}_v^{(l+1)} W^{(l)} )
#
# In addition to the approximation, `Chen et.
# al. <https://arxiv.org/abs/1710.10568>`__ applies a preprocessing trick
# to reduce the number of hops for sampling neighbors by one. This trick
# works for models such as Graph Convolution Networks and GraphSage. It
# preprocesses the input layer. The original GCN takes :math:`X` as input.
# Instead of taking :math:`X` as the input of the model, the trick
# computes :math:`U^{(0)}=\tilde{A}X` and uses :math:`U^{(0)}` as the
# input of the first layer. In this way, the vertices in the first layer
# does not need to compute aggregation over their neighborhood and, thus,
# reduce the number of layers to sample by one.
#
# For a giant graph, both :math:`\tilde{A}` and :math:`X` can be very
# large. We need to perform this operation in a distributed fashion. That
# is, each trainer takes part of the computation and the computation is
# distributed among all trainers. We can use ``update_all`` in the graph
# store to perform this computation.
#
# .. code:: python
#
# g.update_all(fn.copy_src(src='features', out='m'),
# fn.sum(msg='m', out='preprocess'),
# lambda node : {'preprocess': node.data['preprocess'] * node.data['norm']})
#
# ``update_all`` in the graph store runs in a distributed fashion. That
# is, all trainers need to invoke this function and take part of the
# computation. When a trainer completes its portion, it will wait for
# other trainers to complete before proceeding with its other computation.
#
# The node/edge data now live in the graph store and the access to the
# node/edge data is now a little different. The graph store no longer
# supports data access with ``g.ndata``/``g.edata``, which reads the
# entire node/edge data tensor. Instead, users have to use
# ``g.nodes[node_ids].data[embed_name]`` to access data on some nodes.
# (Note: this method is also allowed in ``DGLGraph`` and ``g.ndata`` is
# simply a short syntax for ``g.nodes[:].data``). In addition, the graph
# store supports ``get_n_repr``/``set_n_repr`` for node data and
# ``get_e_repr``/``set_e_repr`` for edge data.
#
# To initialize the node/edge tensors more efficiently, we provide two new
# methods in the graph store client to initialize node data and edge data
# (i.e., ``init_ndata`` for node data or ``init_edata`` for edge data).
# What happened under the hood is that these two methods send
# initialization commands to the server and the graph store server
# initializes the node/edge tensors on behalf of trainers.
#
# Here we show how we should initialize node data for control-variate
# sampling. ``h_i`` stores the history of nodes in layer ``i``;
# ``agg_h_i`` stores the aggregation of the history of neighbor nodes in
# layer ``i``.
#
# .. code:: python
#
# for i in range(n_layers):
# g.init_ndata('h_{}'.format(i), (features.shape[0], args.n_hidden), 'float32')
# g.init_ndata('agg_h_{}'.format(i), (features.shape[0], args.n_hidden), 'float32')
#
# After we initialize node data, we train GCN with control-variate
# sampling as below. The training code takes advantage of preprocessed
# input data in the first layer and works identically to the
# single-process training procedure.
#
# .. code:: python
#
# for nf in NeighborSampler(g, batch_size, num_neighbors,
# neighbor_type='in', num_hops=L-1,
# seed_nodes=labeled_nodes):
# for i in range(nf.num_blocks):
# # aggregate history on the original graph
# g.pull(nf.layer_parent_nid(i+1),
# fn.copy_src(src='h_{}'.format(i), out='m'),
# lambda node: {'agg_h_{}'.format(i): node.data['m'].mean(axis=1)})
# # We need to copy data in the NodeFlow to the right context.
# nf.copy_from_parent(ctx=right_context)
# nf.apply_layer(0, lambda node : {'h' : layer(node.data['preprocess'])})
# h = nf.layers[0].data['h']
#
# for i in range(nf.num_blocks):
# prev_h = nf.layers[i].data['h_{}'.format(i)]
# # compute delta_h, the difference of the current activation and the history
# nf.layers[i].data['delta_h'] = h - prev_h
# # refresh the old history
# nf.layers[i].data['h_{}'.format(i)] = h.detach()
# # aggregate the delta_h
# nf.block_compute(i,
# fn.copy_src(src='delta_h', out='m'),
# lambda node: {'delta_h': node.data['m'].mean(axis=1)})
# delta_h = nf.layers[i + 1].data['delta_h']
# agg_h = nf.layers[i + 1].data['agg_h_{}'.format(i)]
# # control variate estimator
# nf.layers[i + 1].data['h'] = delta_h + agg_h
# nf.apply_layer(i + 1, lambda node : {'h' : layer(node.data['h'])})
# h = nf.layers[i + 1].data['h']
# # update history
# nf.copy_to_parent()
#
# The complete example code can be found
# `here <https://github.com/dmlc/dgl/tree/master/examples/mxnet/sampling>`__.
#
# After showing how the shared-memory graph store is used with
# control-variate sampling, let’s see how to use it for multi-GPU training
# and how to optimize the training on a non-uniform memory access (NUMA)
# machine. A NUMA machine here means a machine with multiple processors
# and large memory. It works for all backend frameworks as long as the
# framework supports multi-processing training. If we use MXNet as the
# backend, we can use the distributed MXNet kvstore to aggregate gradients
# among processes and use the MXNet launch tool to launch multiple workers
# that run the training script. The command below launches our example
# code for multi-processing GCN training with control variate sampling and
# it runs 4 trainers.
#
# ::
#
# python3 ../incubator-mxnet/tools/launch.py -n 4 -s 1 --launcher local \
# python3 examples/mxnet/sampling/multi_process_train.py \
# --graph-name reddit \
# --model gcn_cv --num-neighbors 1 \
# --batch-size 2500 --test-batch-size 5000 \
# --n-hidden 64
#
# ..
#
# It is fairly easy to enable multi-GPU training. All we need to do is to
# copy data to a right GPU context and invoke NodeFlow computation in that
# GPU context. As shown above, we specify a context ``right_context`` in
# ``copy_from_parent``.
#
# To optimize the computation on a NUMA machine, we need to configure each
# process properly. For example, we should use the same number of
# processes as the number of NUMA nodes (usually equivalent to the number
# of processors) and bind the processes to NUMA nodes. In addition, we
# should reduce the number of OpenMP threads to the number of CPU cores in
# a processor and reduce the number of threads of the MXNet kvstore to a
# small number such as 4.
#
# .. code:: python
#
# import numa
# import os
# if 'DMLC_TASK_ID' in os.environ and int(os.environ['DMLC_TASK_ID']) < 4:
# # bind the process to a NUMA node.
# numa.bind([int(os.environ['DMLC_TASK_ID'])])
# # Reduce the number of OpenMP threads to match the number of
# # CPU cores of a processor.
# os.environ['OMP_NUM_THREADS'] = '16'
# else:
# # Reduce the number of OpenMP threads in the MXNet KVstore server to 4.
# os.environ['OMP_NUM_THREADS'] = '4'
#
# Given the configuration above, NUMA-aware multi-processing training can
# accelerate training almost by a factor of 4 as shown in the figure below
# on an X1.32xlarge instance where there are 4 processors, each of which
# has 16 physical CPU cores. We can see that NUMA-unaware training cannot
# take advantage of computation power of the machine. It is even slightly
# slower than just using one of the processors in the machine. NUMA-aware
# training, on the other hand, takes about only 20 seconds to converge to
# the accuracy of 96% with 20 iterations.
#
# |image1|
#
# Distributed Sampler
# -------------------
#
# For many tasks, we found that the sampling takes a significant amount of
# time for the training process on a giant graph. So DGL supports
# distributed samplers for speeding up the sampling process on giant
# graphs. DGL allows users to launch multiple samplers on different
# machines concurrently, and each sampler can send its sampled subgraph
# (``NodeFlow``) to trainer machines continuously.
#
# To use the distributed sampler on DGL, users start both trainer and
# sampler processes on different machines. Users can find the complete
# demo code and launch scripts `in this
# link <https://github.com/dmlc/dgl/tree/master/examples/mxnet/sampling/dis_sampling>`__
# and this tutorial will focus on the main difference between
# single-machine code and distributed code.
#
# For the trainer, developers can easily migrate the existing
# single-machine sampler code to the distributed setting seamlessly by
# just changing a few lines of code. First, users need to create a
# distributed ``SamplerReceiver`` object before training:
#
# .. code:: python
#
# sampler = dgl.contrib.sampling.SamplerReceiver(graph, ip_addr, num_sampler)
#
# The ``SamplerReceiver`` class is used for receiving remote subgraph from
# other machines. This API has three arguments: ``parent_graph``,
# ``ip_address``, and ``number_of_samplers``.
#
# After that, developers can change just one line of existing
# single-machine training code like this:
#
# .. code:: python
#
# for nf in sampler:
# for i in range(nf.num_blocks):
# # aggregate history on the original graph
# g.pull(nf.layer_parent_nid(i+1),
# fn.copy_src(src='h_{}'.format(i), out='m'),
# lambda node: {'agg_h_{}'.format(i): node.data['m'].mean(axis=1)})
#
# ...
#
# Here, we use the code ``for nf in sampler`` to replace the original
# single-machine sampling code:
#
# .. code:: python
#
# for nf in NeighborSampler(g, batch_size, num_neighbors,
# neighbor_type='in', num_hops=L-1,
# seed_nodes=labeled_nodes):
#
# All the other parts of the original single-machine code is not changed.
#
# In addition, developers need to write sampling logic on the sampler
# machine. For neighbor-sampler, developers can just copy their existing
# single-machine code to sampler machines like this:
#
# .. code:: python
#
# sender = dgl.contrib.sampling.SamplerSender(trainer_address)
#
# ...
#
# for n in num_epoch:
# for nf in dgl.contrib.sampling.NeighborSampler(graph, batch_size, num_neighbors,
# neighbor_type='in',
# shuffle=shuffle,
# num_workers=num_workers,
# num_hops=num_hops,
# add_self_loop=add_self_loop,
# seed_nodes=seed_nodes):
# sender.send(nf, trainer_id)
# # tell trainer I have finished current epoch
# sender.signal(trainer_id)
#
# The figure below shows the overall performance improvement of training
# GCN and GraphSage on the Reddit dataset after deploying the
# optimizations in this tutorial. Our NUMA optimization speeds up the
# training by a factor of 4. The distributed sampling achieves additional
# 20%-40% speed improvement for different tasks.
#
# |image2|
#
# Scale to giant graphs
# ---------------------
#
# Finally, we would like to demonstrate the scalability of DGL with giant
# synthetic graphs. We create three large power-law graphs with
# `RMAT <http://www.cs.cmu.edu/~christos/PUBLICATIONS/siam04.pdf>`__. Each
# node is associated with 100 features and we compute node embeddings with
# 64 dimensions. Below shows the training speed and memory consumption of
# GCN with neighbor sampling.
#
# ====== ====== ================== ===========
# #Nodes #Edges Time per epoch (s) Memory (GB)
# ====== ====== ================== ===========
# 5M 250M 4.7 8
# 50M 2.5B 46 75
# 500M 25B 505 740
# ====== ====== ================== ===========
#
# We can see that DGL can scale to graphs with up to 500M nodes and 25B
# edges.
#
# .. |image0| image:: https://data.dgl.ai/tutorial/sampling/arch.png
# .. |image1| image:: https://data.dgl.ai/tutorial/sampling/NUMA_speedup.png
# .. |image2| image:: https://data.dgl.ai/tutorial/sampling/whole_speedup.png
#
.. _tutorials5-index:
Training on giant graphs
=============================
* **Sampling** `[paper] <https://arxiv.org/abs/1710.10568>`__ `[tutorial]
<5_giant_graph/1_sampling_mx.html>`__ `[MXNet code]
<https://github.com/dmlc/dgl/tree/master/examples/mxnet/sampling>`__ `[Pytorch code]
<https://github.com/dmlc/dgl/tree/master/examples/pytorch/sampling>`__:
You can perform neighbor sampling and control-variate sampling to train a
graph convolution network and its variants on a giant graph.
* **Scale to giant graphs** `[tutorial] <5_giant_graph/2_giant.html>`__
`[MXNet code] <https://github.com/dmlc/dgl/tree/master/examples/mxnet/sampling>`__
`[Pytorch code]
<https://github.com/dmlc/dgl/tree/master/examples/pytorch/sampling>`__:
You can find two components (graph store and distributed sampler) to scale to
graphs with hundreds of millions of nodes.
This diff is collapsed.
......@@ -6,9 +6,24 @@ dgl.DGLGraph
.. currentmodule:: dgl
.. class:: DGLGraph
Class for storing graph structure and node/edge feature data.
There are a few ways to create create a DGLGraph:
* To create a homogeneous graph from Tensor data, use :func:`dgl.graph`.
* To create a heterogeneous graph from Tensor data, use :func:`dgl.heterograph`.
* To create a graph from other data sources, use ``dgl.*`` create ops. See
:ref:`api-graph-create-ops`.
Read the user guide chapter :ref:`guide-graph` for an in-depth explanation about its
usage.
Querying metagraph structure
----------------------------
Methods for getting information about the node and edge types. They are typically useful
when the graph is heterogeneous.
.. autosummary::
:toctree: ../../generated/
......@@ -19,12 +34,13 @@ Querying metagraph structure
DGLGraph.canonical_etypes
DGLGraph.metagraph
DGLGraph.to_canonical_etype
DGLGraph.get_ntype_id
DGLGraph.get_etype_id
Querying graph structure
------------------------
Methods for getting information about the graph structure such as capacity, connectivity,
neighborhood, etc.
.. autosummary::
:toctree: ../../generated/
......@@ -53,15 +69,20 @@ Querying graph structure
Querying and manipulating sparse format
---------------------------------------
Methods for getting or manipulating the internal storage formats of a ``DGLGraph``.
.. autosummary::
:toctree: ../../generated/
DGLGraph.formats
DGLGraph.create_format_
Querying and manipulating index data type
Querying and manipulating node/edge ID type
-----------------------------------------
Methods for getting or manipulating the data type for storing structure-related
data such as node and edge IDs.
.. autosummary::
:toctree: ../../generated/
......@@ -72,6 +93,9 @@ Querying and manipulating index data type
Using Node/edge features
------------------------
Methods for getting or setting the data type for storing structure-related
data such as node and edge IDs.
.. autosummary::
:toctree: ../../generated/
......@@ -85,12 +109,14 @@ Using Node/edge features
DGLGraph.dstnodes
DGLGraph.srcdata
DGLGraph.dstdata
DGLGraph.local_var
DGLGraph.local_scope
Transforming graph
------------------
Methods for generating a new graph by transforming the current ones. Most of them
are alias of the :ref:`api-subgraph-extraction` and :ref:`api-transform`
under the ``dgl`` namespace.
.. autosummary::
:toctree: ../../generated/
......@@ -99,9 +125,16 @@ Transforming graph
DGLGraph.node_type_subgraph
DGLGraph.edge_type_subgraph
DGLGraph.__getitem__
DGLGraph.line_graph
DGLGraph.reverse
DGLGraph.add_self_loop
DGLGraph.remove_self_loop
DGLGraph.to_simple
Adjacency and incidence matrix
---------------------------------
Converting to other formats
---------------------------
Methods for getting the adjacency and the incidence matrix of the graph.
.. autosummary::
:toctree: ../../generated/
......@@ -114,6 +147,8 @@ Converting to other formats
Computing with DGLGraph
-----------------------------
Methods for performing message passing, applying functions on node/edge features, etc.
.. autosummary::
:toctree: ../../generated/
......@@ -130,7 +165,11 @@ Computing with DGLGraph
DGLGraph.filter_edges
Querying batch summary
----------------------
---------------------------------
Methods for getting the batching information if the current graph is a batched
graph generated from :func:`dgl.batch`. They are also widely used in the
:ref:`api-batch`.
.. autosummary::
:toctree: ../../generated/
......@@ -142,6 +181,8 @@ Querying batch summary
Mutating topology
-----------------
Methods for mutating the graph structure *in-place*.
.. autosummary::
:toctree: ../../generated/
......@@ -153,8 +194,20 @@ Mutating topology
Device Control
--------------
Methods for getting or changing the device on which the graph is hosted.
.. autosummary::
:toctree: ../../generated/
DGLGraph.to
DGLGraph.device
Misc
----
Other utility methods.
.. autosummary::
:toctree: ../../generated/
DGLGraph.local_scope
......@@ -4,45 +4,42 @@ dgl.data
=========
.. currentmodule:: dgl.data
.. automodule:: dgl.data
Dataset Classes
---------------
Quick links:
DGL dataset
```````````
* `Node Prediction Datasets`_
* `Edge Prediction Datasets`_
* `Graph Prediction Datasets`_
Base Dataset Class
---------------------------
.. autoclass:: DGLDataset
:members: download, save, load, process, has_cache, __getitem__, __len__
DGL builtin dataset
```````````````````
.. _sstdata:
.. autoclass:: DGLBuiltinDataset
:members: download
Node Prediction Datasets
---------------------------------------
.. _sstdata:
DGL hosted datasets for node classification/regression tasks.
Stanford sentiment treebank dataset
```````````````````````````````````
For more information about the dataset, see `Sentiment Analysis <https://nlp.stanford.edu/sentiment/index.html>`__.
.. autoclass:: SSTDataset
:members: __getitem__, __len__
.. _karateclubdata:
.. _karateclubdata:
Karate club dataset
```````````````````````````````````
.. autoclass:: KarateClubDataset
:members: __getitem__, __len__
.. _citationdata:
Citation network dataset
```````````````````````````````````
.. autoclass:: CoraGraphDataset
:members: __getitem__, __len__
......@@ -52,22 +49,13 @@ Citation network dataset
.. autoclass:: PubmedGraphDataset
:members: __getitem__, __len__
.. _kgdata:
Knowlege graph dataset
.. _corafulldata:
CoraFull dataset
```````````````````````````````````
.. autoclass:: FB15k237Dataset
:members: __getitem__, __len__
.. autoclass:: FB15kDataset
:members: __getitem__, __len__
.. autoclass:: WN18Dataset
.. autoclass:: CoraFullDataset
:members: __getitem__, __len__
.. _rdfdata:
RDF datasets
```````````````````````````````````
......@@ -83,19 +71,9 @@ RDF datasets
.. autoclass:: AMDataset
:members: __getitem__, __len__
.. _corafulldata:
CoraFull dataset
```````````````````````````````````
.. autoclass:: CoraFullDataset
:members: __getitem__, __len__
.. _amazoncobuydata:
Amazon Co-Purchase dataset
```````````````````````````````````
.. autoclass:: AmazonCoBuyComputerDataset
:members: __getitem__, __len__
......@@ -103,60 +81,90 @@ Amazon Co-Purchase dataset
:members: __getitem__, __len__
.. _coauthordata:
Coauthor dataset
```````````````````````````````````
.. autoclass:: CoauthorCSDataset
:members: __getitem__, __len__
.. autoclass:: CoauthorPhysicsDataset
:members: __getitem__, __len__
.. _bitcoinotcdata:
BitcoinOTC dataset
.. _ppidata:
Protein-Protein Interaction dataset
```````````````````````````````````
.. autoclass:: PPIDataset
:members: __getitem__, __len__
.. autoclass:: BitcoinOTCDataset
.. _redditdata:
Reddit dataset
``````````````
.. autoclass:: RedditDataset
:members: __getitem__, __len__
.. _sbmdata:
Symmetric Stochastic Block Model Mixture dataset
````````````````````````````````````````````````
.. autoclass:: SBMMixtureDataset
:members: __getitem__, __len__, collate_fn
ICEWS18 dataset
```````````````````````````````````
.. autoclass:: ICEWS18Dataset
:members: __getitem__, __len__
Edge Prediction Datasets
---------------------------------------
.. _qm7bdata:
DGL hosted datasets for edge classification/regression and link prediction tasks.
QM7b dataset
.. _kgdata:
Knowlege graph dataset
```````````````````````````````````
.. autoclass:: QM7bDataset
.. autoclass:: FB15k237Dataset
:members: __getitem__, __len__
.. autoclass:: FB15kDataset
:members: __getitem__, __len__
.. autoclass:: WN18Dataset
:members: __getitem__, __len__
GDELT dataset
.. _bitcoinotcdata:
BitcoinOTC dataset
```````````````````````````````````
.. autoclass:: BitcoinOTCDataset
:members: __getitem__, __len__
ICEWS18 dataset
```````````````````````````````````
.. autoclass:: ICEWS18Dataset
:members: __getitem__, __len__
GDELT dataset
```````````````````````````````````
.. autoclass:: GDELTDataset
:members: __getitem__, __len__
.. _minigcdataset:
Graph Prediction Datasets
---------------------------------------
DGL hosted datasets for graph classification/regression tasks.
.. _qm7bdata:
QM7b dataset
```````````````````````````````````
.. autoclass:: QM7bDataset
:members: __getitem__, __len__
.. _minigcdataset:
Mini graph classification dataset
`````````````````````````````````
.. autoclass:: MiniGCDataset
:members: __getitem__, __len__
.. _tudata:
TU dataset
``````````
.. autoclass:: TUDataset
:members: __getitem__, __len__
......@@ -164,41 +172,14 @@ TU dataset
:members: __getitem__, __len__
.. _gindataset:
Graph isomorphism network dataset
```````````````````````````````````
A compact subset of graph kernel dataset
.. autoclass:: GINDataset
:members: __getitem__, __len__
.. _ppidata:
Protein-Protein Interaction dataset
```````````````````````````````````
.. autoclass:: PPIDataset
:members: __getitem__, __len__
.. _redditdata:
Reddit dataset
``````````````
.. autoclass:: RedditDataset
:members: __getitem__, __len__
.. _sbmdata:
Symmetric Stochastic Block Model Mixture dataset
````````````````````````````````````````````````
.. autoclass:: SBMMixtureDataset
:members: __getitem__, __len__, collate_fn
Utils
-----
Utilities
-----------------
.. autosummary::
:toctree: ../../generated/
......@@ -214,4 +195,3 @@ Utils
.. autoclass:: dgl.data.utils.Subset
:members: __getitem__, __len__
......@@ -7,47 +7,29 @@ dgl.dataloading
DataLoaders
-----------
PyTorch node/edge DataLoaders
`````````````````````````````
.. currentmodule:: dgl.dataloading.pytorch
DGL DataLoader for mini-batch training works similarly to PyTorch's DataLoader.
It has a generator interface that returns mini-batches sampled from some given graphs.
DGL provides two DataLoaders: a ``NodeDataLoader`` for node classification task
and an ``EdgeDataLoader`` for edge/link prediction task.
.. autoclass:: NodeDataLoader
.. autoclass:: EdgeDataLoader
General collating functions
```````````````````````````
.. currentmodule:: dgl.dataloading.dataloader
.. autoclass:: Collator
:members: dataset, collate
.. autoclass:: NodeCollator
:members: dataset, collate
:show-inheritance:
.. autoclass:: EdgeCollator
:members: dataset, collate
:show-inheritance:
.. _api-dataloading-neighbor-sampling:
Neighborhood Sampling Classes
Neighbor Sampler
-----------------------------
.. currentmodule:: dgl.dataloading.neighbor
Base Multi-layer Neighborhood Sampling Class
````````````````````````````````````````````
Neighbor samplers are classes that control the behavior of ``DataLoader`` s
to sample neighbors. All of them inherit the base :class:`BlockSampler` class, but implement
different neighbor sampling strategies by overriding the ``sample_frontier`` or
the ``sample_blocks`` methods.
.. autoclass:: BlockSampler
:members: sample_frontier, sample_blocks
Uniform Node-wise Neighbor Sampling (GraphSAGE style)
`````````````````````````````````````````````````````
.. currentmodule:: dgl.dataloading.neighbor
.. autoclass:: MultiLayerNeighborSampler
:members: sample_frontier
:show-inheritance:
......@@ -59,8 +41,10 @@ Uniform Node-wise Neighbor Sampling (GraphSAGE style)
Negative Samplers for Link Prediction
-------------------------------------
.. currentmodule:: dgl.dataloading.negative_sampler
Negative samplers are classes that control the behavior of the ``EdgeDataLoader``
to generate negative edges.
.. autoclass:: Uniform
:members: __call__
......@@ -74,6 +74,11 @@ can be used in any `autograd` system. Also, built-in functions can be used not o
or ``apply_edges`` as shown in the example, but wherever message and reduce functions are
required (e.g. ``pull``, ``push``, ``send_and_recv``).
.. _api-built-in:
DGL Built-in Function
-------------------------
Here is a cheatsheet of all the DGL built-in functions.
+-------------------------+-----------------------------------------------------------------+-----------------------+
......
......@@ -4,10 +4,15 @@ dgl
=============================
.. currentmodule:: dgl
.. automodule:: dgl
.. _api-graph-create-ops:
Graph Create Ops
-------------------------
Operators for constructing :class:`DGLGraph` from raw data formats.
.. autosummary::
:toctree: ../../generated/
......@@ -24,9 +29,11 @@ Graph Create Ops
.. _api-subgraph-extraction:
Subgraph Extraction Routines
Subgraph Extraction Ops
-------------------------------------
Operators for extracting and returning subgraphs.
.. autosummary::
:toctree: ../../generated/
......@@ -37,8 +44,12 @@ Subgraph Extraction Routines
in_subgraph
out_subgraph
Graph Mutation Routines
---------------------------------
.. _api-transform:
Graph Transform Ops
----------------------------------
Operators for generating new graphs by manipulating the structure of the existing ones.
.. autosummary::
:toctree: ../../generated/
......@@ -50,13 +61,6 @@ Graph Mutation Routines
add_self_loop
remove_self_loop
add_reverse_edges
Graph Transform Routines
----------------------------------
.. autosummary::
:toctree: ../../generated/
reverse
to_bidirected
to_simple
......@@ -69,9 +73,14 @@ Graph Transform Routines
khop_graph
metapath_reachable_graph
Batching and Reading Out
.. _api-batch:
Batching and Reading Out Ops
-------------------------------
Operators for batching multiple graphs into one for batch processing and
operators for computing graph-level representation for both single and batched graphs.
.. autosummary::
:toctree: ../../generated/
......@@ -92,18 +101,22 @@ Batching and Reading Out
topk_nodes
topk_edges
Adjacency Related Routines
Adjacency Related Utilities
-------------------------------
Utilities for computing adjacency matrix and Lapacian matrix.
.. autosummary::
:toctree: ../../generated/
khop_adj
laplacian_lambda_max
Propagate Messages by Traversals
Traversals
------------------------------------------
Utilities for traversing graphs.
.. autosummary::
:toctree: ../../generated/
......@@ -115,6 +128,9 @@ Propagate Messages by Traversals
Utilities
-----------------------------------------------
Other utilities for controlling randomness, saving and loading graphs, etc.
.. autosummary::
:toctree: ../../generated/
......
......@@ -5,7 +5,7 @@ dgl.sampling
.. automodule:: dgl.sampling
Random walk sampling functions
Random walk
------------------------------
.. autosummary::
......@@ -14,7 +14,7 @@ Random walk sampling functions
random_walk
pack_traces
Neighbor sampling functions
Neighbor sampling
---------------------------
.. autosummary::
......@@ -22,8 +22,4 @@ Neighbor sampling functions
sample_neighbors
select_topk
Builtin sampler classes for more complicated sampling algorithms
----------------------------------------------------------------
.. autoclass:: RandomWalkNeighborSampler
.. autoclass:: PinSAGESampler
PinSAGESampler
......@@ -5,11 +5,12 @@ API Reference
:maxdepth: 2
dgl
dgl.DGLGraph
dgl.data
nn
dgl.ops
dgl.function
sampling
dgl.dataloading
dgl.DGLGraph
dgl.distributed
dgl.function
nn
dgl.ops
dgl.sampling
udf
......@@ -3,15 +3,6 @@
NN Modules (MXNet)
===================
.. contents:: Contents
:local:
We welcome your contribution! If you want a model to be implemented in DGL as a NN module,
please `create an issue <https://github.com/dmlc/dgl/issues>`_ started with "[Feature Request] NN Module XXXModel".
If you want to contribute a NN module, please `create a pull request <https://github.com/dmlc/dgl/pulls>`_ started
with "[NN] XXXModel in MXNet NN Modules" and our team member would review this PR.
Conv Layers
----------------------------------------
......
......@@ -3,15 +3,6 @@
NN Modules (PyTorch)
====================
.. contents:: Contents
:local:
We welcome your contribution! If you want a model to be implemented in DGL as a NN module,
please `create an issue <https://github.com/dmlc/dgl/issues>`_ started with "[Feature Request] NN Module XXXModel".
If you want to contribute a NN module, please `create a pull request <https://github.com/dmlc/dgl/pulls>`_ started
with "[NN] XXXModel in PyTorch NN Modules" and our team member would review this PR.
.. _apinn-pytorch-conv:
Conv Layers
......
......@@ -3,15 +3,6 @@
NN Modules (Tensorflow)
====================
.. contents:: Contents
:local:
We welcome your contribution! If you want a model to be implemented in DGL as a NN module,
please `create an issue <https://github.com/dmlc/dgl/issues>`_ started with "[Feature Request] NN Module XXXModel".
If you want to contribute a NN module, please `create a pull request <https://github.com/dmlc/dgl/pulls>`_ started
with "[NN] XXXModel in tensorflow NN Modules" and our team member would review this PR.
Conv Layers
----------------------------------------
......
.. _apiudf:
dgl.udf
User-defined Function
==================================================
.. currentmodule:: dgl.udf
......
......@@ -103,14 +103,15 @@ Getting Started
:glob:
api/python/dgl
api/python/dgl.DGLGraph
api/python/dgl.data
api/python/nn
api/python/dgl.ops
api/python/dgl.function
api/python/sampling
api/python/dgl.dataloading
api/python/dgl.DGLGraph
api/python/dgl.distributed
api/python/dgl.function
api/python/nn
api/python/dgl.ops
api/python/dgl.sampling
api/python/udf
.. toctree::
:maxdepth: 3
......
"""DGL root package."""
"""
The ``dgl`` package contains data structure for storing structural and feature data
(i.e., the :class:`DGLGraph` class) and also utilities for generating, manipulating
and transforming graphs.
"""
# Windows compatibility
# This initializes Winsock and performs cleanup at termination as required
import socket
......
......@@ -10,7 +10,7 @@ from . import utils
__all__ = ['batch', 'unbatch', 'batch_hetero', 'unbatch_hetero']
def batch(graphs, ndata=ALL, edata=ALL, *, node_attrs=None, edge_attrs=None):
r"""Batch a collection of ``DGLGraph``s into one graph for more efficient
r"""Batch a collection of :class:`DGLGraph` s into one graph for more efficient
graph computation.
Each input graph becomes one disjoint component of the batched graph. The nodes
......@@ -35,8 +35,8 @@ def batch(graphs, ndata=ALL, edata=ALL, *, node_attrs=None, edge_attrs=None):
The numbers of nodes and edges of the input graphs are accessible via the
:func:`DGLGraph.batch_num_nodes` and :func:`DGLGraph.batch_num_edges` attributes
of the result graph. For homographs, they are 1D integer tensors, with each element
being the number of nodes/edges of the corresponding input graph. For
of the resulting graph. For homogeneous graphs, they are 1D integer tensors,
with each element being the number of nodes/edges of the corresponding input graph. For
heterographs, they are dictionaries of 1D integer tensors, with node
type or edge type as the keys.
......@@ -46,7 +46,7 @@ def batch(graphs, ndata=ALL, edata=ALL, *, node_attrs=None, edge_attrs=None):
By default, node/edge features are batched by concatenating the feature tensors
of all input graphs. This thus requires features of the same name to have
the same data type and feature size. One can pass ``None`` to the ``ndata``
or ``edata`` argument to prevent feature batching, or pass a list of string
or ``edata`` argument to prevent feature batching, or pass a list of strings
to specify which features to batch.
To unbatch the graph back to a list, use the :func:`dgl.unbatch` function.
......@@ -68,7 +68,7 @@ def batch(graphs, ndata=ALL, edata=ALL, *, node_attrs=None, edge_attrs=None):
Examples
--------
Batch homographs
Batch homogeneous graphs
>>> import dgl
>>> import torch as th
......@@ -251,13 +251,13 @@ def unbatch(g, node_split=None, edge_split=None):
"""Revert the batch operation by split the given graph into a list of small ones.
This is the reverse operation of :func:``dgl.batch``. If the ``node_split``
or the ``edge_split`` is not given, it uses the :func:`DGLGraph.batch_num_nodes`
and :func:`DGLGraph.batch_num_edges` of the input graph.
or the ``edge_split`` is not given, it calls :func:`DGLGraph.batch_num_nodes`
and :func:`DGLGraph.batch_num_edges` of the input graph to get the information.
If the ``node_split`` or the ``edge_split`` arguments are given,
it will partition the graph according to the given segments. One must assure
that the partition is valid -- edges of the i^th graph only connect nodes
belong to the i^th graph. Otherwise, an error will be thrown.
belong to the i^th graph. Otherwise, DGL will throw an error.
The function supports heterograph input, in which case the two split
section arguments shall be of dictionary type -- similar to the
......
......@@ -35,7 +35,7 @@ def graph(data,
idtype=None,
device=None,
**deprecated_kwargs):
"""Create a graph.
"""Create a graph and return.
Parameters
----------
......@@ -199,7 +199,7 @@ def heterograph(data_dict,
num_nodes_dict=None,
idtype=None,
device=None):
"""Create a heterogeneous graph.
"""Create a heterogeneous graph and return.
Parameters
----------
......@@ -354,33 +354,34 @@ def heterograph(data_dict,
def to_heterogeneous(G, ntypes, etypes, ntype_field=NTYPE,
etype_field=ETYPE, metagraph=None):
"""Convert the given homogeneous graph to a heterogeneous graph.
"""Convert a homogeneous graph to a heterogeneous graph and return.
The input graph should have only one type of nodes and edges. Each node and edge
stores an integer feature (under ``ntype_field`` and ``etype_field``), representing
the type id, which can be used to retrieve the type names stored
in the given ``ntypes`` and ``etypes`` arguments.
stores an integer feature as its type ID
(specified by :attr:`ntype_field` and :attr:`etype_field`).
DGL uses it to retrieve the type names stored in the given
:attr:`ntypes` and :attr:`etypes` arguments.
The function will automatically distinguish edge types that have the same given
type IDs but different src and dst type IDs. For example, we allow both edges A and B
type IDs but different src and dst type IDs. For example, it allows both edges A and B
to have the same type ID 0, but one has (0, 1) and the other as (2, 3) as the
(src, dst) type IDs. In this case, the function will "split" edge type 0 into two types:
(0, ty_A, 1) and (2, ty_B, 3). In another word, these two edges share the same edge
type name, but can be distinguished by a canonical edge type tuple.
This function will copy any node/edge features from :attr:`G` to the returned heterogeneous
graph, except for node/edge types to recover the heterogeneous graph.
type name, but can be distinguished by an edge type triplet.
One can retrieve the IDs of the nodes/edges in :attr:`G` from the returned heterogeneous
graph with node feature ``dgl.NID`` and edge feature ``dgl.EID`` respectively.
The function stores the node and edge IDs in the input graph using the ``dgl.NID``
and ``dgl.EID`` names in the ``ndata`` and ``edata`` of the resulting graph.
It also copies any node/edge features from :attr:`G` to the returned heterogeneous
graph, except for reserved fields for storing type IDs (``dgl.NTYPE`` and ``dgl.ETYPE``)
and node/edge IDs (``dgl.NID`` and ``dgl.EID``).
Parameters
----------
G : DGLGraph
The homogeneous graph.
ntypes : list of str
ntypes : list[str]
The node type names.
etypes : list of str
etypes : list[str]
The edge type names.
ntype_field : str, optional
The feature field used to store node type. (Default: ``dgl.NTYPE``)
......@@ -389,24 +390,18 @@ def to_heterogeneous(G, ntypes, etypes, ntype_field=NTYPE,
metagraph : networkx MultiDiGraph, optional
Metagraph of the returned heterograph.
If provided, DGL assumes that G can indeed be described with the given metagraph.
If None, DGL will infer the metagraph from the given inputs, which would be
potentially slower for large graphs.
If None, DGL will infer the metagraph from the given inputs, which could be
costly for large graphs.
Returns
-------
DGLGraph
A heterogeneous graph. The parent node and edge ID are stored in the column
``dgl.NID`` and ``dgl.EID`` respectively for all node/edge types.
A heterogeneous graph.
Notes
-----
The returned node and edge types may not necessarily be in the same order as
``ntypes`` and ``etypes``. And edge types may be duplicated if the source
and destination types differ.
The node IDs of a single type in the returned heterogeneous graph is ordered
the same as the nodes with the same ``ntype_field`` feature. Edge IDs of
a single type is similar.
``ntypes`` and ``etypes``.
Examples
--------
......@@ -568,15 +563,15 @@ def to_hetero(G, ntypes, etypes, ntype_field=NTYPE, etype_field=ETYPE,
etype_field=etype_field, metagraph=metagraph)
def to_homogeneous(G, ndata=None, edata=None):
"""Convert the given heterogeneous graph to a homogeneous graph.
The returned graph has only one type of nodes and edges.
"""Convert a heterogeneous graph to a homogeneous graph and return.
Node and edge types are stored as features in the returned graph. Each feature
is an integer representing the type id, which can be used to retrieve the type
names stored in ``G.ntypes`` and ``G.etypes`` arguments.
Node and edge types of the input graph are stored as the ``dgl.NTYPE``
and ``dgl.ETYPE`` features in the returned graph.
Each feature is an integer representing the type id, determined by the
:meth:`DGLGraph.get_ntype_id` and :meth:`DGLGraph.get_etype_id` methods.
If all
The function also stores the original node/edge IDs as the ``dgl.NID``
and ``dgl.EID`` features in the returned graph.
Parameters
----------
......@@ -596,8 +591,7 @@ def to_homogeneous(G, ndata=None, edata=None):
Returns
-------
DGLGraph
A homogeneous graph. The parent node and edge type/ID are stored in
columns ``dgl.NTYPE/dgl.NID`` and ``dgl.ETYPE/dgl.EID`` respectively.
A homogeneous graph.
Examples
--------
......@@ -695,7 +689,7 @@ def from_scipy(sp_mat,
eweight_name=None,
idtype=None,
device=None):
"""Create a graph from a SciPy sparse matrix.
"""Create a graph from a SciPy sparse matrix and return.
Parameters
----------
......@@ -785,7 +779,7 @@ def bipartite_from_scipy(sp_mat,
eweight_name=None,
idtype=None,
device=None):
"""Create a unidirectional bipartite graph from a SciPy sparse matrix.
"""Create a uni-directional bipartite graph from a SciPy sparse matrix and return.
The created graph will have two types of nodes ``utype`` and ``vtype`` as well as one
edge type ``etype`` whose edges are from ``utype`` to ``vtype``.
......@@ -881,11 +875,12 @@ def from_networkx(nx_graph,
edge_id_attr_name=None,
idtype=None,
device=None):
"""Create a graph from a NetworkX graph.
"""Create a graph from a NetworkX graph and return.
Creating a DGLGraph from a NetworkX graph is not fast especially for large scales.
It is recommended to first convert a NetworkX graph into a tuple of node-tensors
and then construct a DGLGraph with :func:`dgl.graph`.
.. note::
Creating a DGLGraph from a NetworkX graph is not fast especially for large scales.
It is recommended to first convert a NetworkX graph into a tuple of node-tensors
and then construct a DGLGraph with :func:`dgl.graph`.
Parameters
----------
......@@ -903,7 +898,7 @@ def from_networkx(nx_graph,
The names of the edge attributes to retrieve from the NetworkX graph. If given, DGL
stores the retrieved edge attributes in ``edata`` of the returned graph using their
original names. The attribute data must be convertible to Tensor type (e.g., scalar,
numpy.ndarray, list, etc.). It must be None if :attr:`nx_graph` is undirected.
``numpy.ndarray``, list, etc.). It must be None if :attr:`nx_graph` is undirected.
edge_id_attr_name : str, optional
The name of the edge attribute that stores the edge IDs. If given, DGL will assign edge
IDs accordingly when creating the graph, so the attribute must be valid IDs, i.e.
......@@ -1046,14 +1041,15 @@ def bipartite_from_networkx(nx_graph,
edge_id_attr_name=None,
idtype=None,
device=None):
"""Create a unidirectional bipartite graph from a NetworkX graph.
"""Create a unidirectional bipartite graph from a NetworkX graph and return.
The created graph will have two types of nodes ``utype`` and ``vtype`` as well as one
edge type ``etype`` whose edges are from ``utype`` to ``vtype``.
Creating a DGLGraph from a NetworkX graph is not fast especially for large scales.
It is recommended to first convert a NetworkX graph into a tuple of node-tensors
and then construct a DGLGraph with :func:`dgl.heterograph`.
.. note::
Creating a DGLGraph from a NetworkX graph is not fast especially for large scales.
It is recommended to first convert a NetworkX graph into a tuple of node-tensors
and then construct a DGLGraph with :func:`dgl.heterograph`.
Parameters
----------
......@@ -1074,7 +1070,7 @@ def bipartite_from_networkx(nx_graph,
The names of the node attributes for node type :attr:`utype` to retrieve from the
NetworkX graph. If given, DGL stores the retrieved node attributes in
``nodes[utype].data`` of the returned graph using their original names. The attribute
data must be convertible to Tensor type (e.g., scalar, numpy.array, list, etc.).
data must be convertible to Tensor type (e.g., scalar, ``numpy.ndarray``, list, etc.).
e_attrs : list[str], optional
The names of the edge attributes to retrieve from the NetworkX graph. If given, DGL
stores the retrieved edge attributes in ``edata`` of the returned graph using their
......@@ -1242,14 +1238,16 @@ def bipartite_from_networkx(nx_graph,
return g.to(device)
def to_networkx(g, node_attrs=None, edge_attrs=None):
"""Convert a homogeneous graph to a NetworkX graph.
"""Convert a homogeneous graph to a NetworkX graph and return.
It will save the edge IDs as the ``'id'`` edge attribute in the returned NetworkX graph.
The resulting NetworkX graph also contains the node/edge features of the input graph.
Additionally, DGL saves the edge IDs as the ``'id'`` edge attribute in the
returned NetworkX graph.
Parameters
----------
g : DGLGraph
A homogeneous graph on CPU.
A homogeneous graph.
node_attrs : iterable of str, optional
The node attributes to copy from ``g.ndata``. (Default: None)
edge_attrs : iterable of str, optional
......@@ -1260,6 +1258,10 @@ def to_networkx(g, node_attrs=None, edge_attrs=None):
networkx.DiGraph
The converted NetworkX graph.
Notes
-----
The function only supports CPU graph input.
Examples
--------
The following example uses PyTorch backend.
......
"""Data related package."""
"""The ``dgl.data`` package contains datasets hosted by DGL and also utilities
for downloading, processing, saving and loading data from external resources.
"""
from __future__ import absolute_import
from . import citation_graph as citegrh
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment