Unverified Commit 877101a9 authored by Mufei Li's avatar Mufei Li Committed by GitHub
Browse files

[Doc] Update Doc for UDFs (#2131)



* Update Doc for UDFs

* add degree bucketing explanation

* Update udf.rst

* Update
Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
parent 5e9d2889
.. _apiudf:
User-defined Function
User-defined Functions
==================================================
.. currentmodule:: dgl.udf
User-defined functions (UDFs) are flexible ways to configure message passing computation.
There are two types of UDFs in DGL:
User-defined functions (UDFs) allow arbitrary computation in message passing
(see :ref:`guide-message-passing`) and edge feature update with
:func:`~dgl.DGLGraph.apply_edges`. They bring more flexibility when :ref:`apifunction`
cannot realize a desired computation.
* **Node UDF** of signature ``NodeBatch -> dict``. The argument represents
a batch of nodes. The returned dictionary should have ``str`` type key and ``tensor``
type values.
* **Edge UDF** of signature ``EdgeBatch -> dict``. The argument represents
a batch of edges. The returned dictionary should have ``str`` type key and ``tensor``
type values.
Edge-wise User-defined Function
-------------------------------
The size of the batch dimension is determined by the DGL framework
for good efficiency and small memory footprint. Users should not make
assumption in the batch dimension.
One can use an edge-wise user defined function for a message function in message passing or
a function to apply in :func:`~dgl.DGLGraph.apply_edges`. It takes a batch of edges as input
and returns messages (in message passing) or features (in :func:`~dgl.DGLGraph.apply_edges`)
for each edge. The function may combine the features of the edges and their end nodes in
computation.
EdgeBatch
---------
Formally, it takes the following form
The class that can represent a batch of edges.
.. code::
def edge_udf(edges):
"""
Parameters
----------
edges : EdgeBatch
A batch of edges.
Returns
-------
dict[str, tensor]
The messages or edge features generated. It maps a message/feature name to the
corresponding messages/features of all edges in the batch. The order of the
messages/features is the same as the order of the edges in the input argument.
"""
DGL generates :class:`~dgl.udf.EdgeBatch` instances internally, which expose the following
interface for defining ``edge_udf``.
.. autosummary::
:toctree: ../../generated/
......@@ -32,12 +49,33 @@ The class that can represent a batch of edges.
EdgeBatch.data
EdgeBatch.edges
EdgeBatch.batch_size
EdgeBatch.__len__
NodeBatch
---------
Node-wise User-defined Function
-------------------------------
One can use a node-wise user defined function for a reduce function in message passing. It takes
a batch of nodes as input and returns the updated features for each node. It may combine the
current node features and the messages nodes received. Formally, it takes the following form
.. code::
def node_udf(nodes):
"""
Parameters
----------
nodes : NodeBatch
A batch of nodes.
Returns
-------
dict[str, tensor]
The updated node features. It maps a feature name to the corresponding features of
all nodes in the batch. The order of the nodes is the same as the order of the nodes
in the input argument.
"""
The class that can represent a batch of nodes.
DGL generates :class:`~dgl.udf.NodeBatch` instances internally, which expose the following
interface for defining ``node_udf``.
.. autosummary::
:toctree: ../../generated/
......@@ -46,4 +84,33 @@ The class that can represent a batch of nodes.
NodeBatch.mailbox
NodeBatch.nodes
NodeBatch.batch_size
NodeBatch.__len__
Degree Bucketing for Message Passing with User Defined Functions
----------------------------------------------------------------
DGL employs a degree-bucketing mechanism for message passing with UDFs. It groups nodes with
a same in-degree and invokes message passing for each group of nodes. As a result, one shall
not make any assumptions about the batch size of :class:`~dgl.udf.NodeBatch` instances.
For a batch of nodes, DGL stacks the incoming messages of each node along the second dimension,
ordered by edge ID. An example goes as follows:
.. code:: python
>>> import dgl
>>> import torch
>>> import dgl.function as fn
>>> g = dgl.graph(([1, 3, 5, 0, 4, 2, 3, 3, 4, 5], [1, 1, 0, 0, 1, 2, 2, 0, 3, 3]))
>>> g.edata['eid'] = torch.arange(10)
>>> def reducer(nodes):
... print(nodes.mailbox['eid'])
... return {'n': nodes.mailbox['eid'].sum(1)}
>>> g.update_all(fn.copy_e('eid', 'eid'), reducer)
tensor([[5, 6],
[8, 9]])
tensor([[3, 7, 2],
[0, 1, 4]])
Essentially, node #2 and node #3 are grouped into one bucket with in-degree of 2, and node
#0 and node #1 are grouped into one bucket with in-degree of 3. Within each bucket, the
edges are ordered by the edge IDs for each node.
......@@ -29,58 +29,190 @@ class EdgeBatch(object):
@property
def src(self):
"""Return the feature data of the source nodes.
"""Return a view of the source node features for the edges in the batch.
Returns
-------
dict with str keys and tensor values
Features of the source nodes.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a node feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that retrieves the source node features for edges
>>> def edge_udf(edges):
>>> # edges.src['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'src': edges.src['h']}
>>> # Copy features from source nodes to edges
>>> g.apply_edges(edge_udf)
>>> g.edata['src']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing, which is equivalent to dgl.function.copy_u
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('src', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
"""
return self._src_data
@property
def dst(self):
"""Return the feature data of the destination nodes.
"""Return a view of the destination node features for the edges in the batch.
Returns
-------
dict with str keys and tensor values
Features of the destination nodes.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a node feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.tensor([[0.], [1.]])
>>> # Define a UDF that retrieves the destination node features for edges
>>> def edge_udf(edges):
>>> # edges.dst['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'dst': edges.dst['h']}
>>> # Copy features from destination nodes to edges
>>> g.apply_edges(edge_udf)
>>> g.edata['dst']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('dst', 'h'))
>>> g.ndata['h']
tensor([[0.],
[2.]])
"""
return self._dst_data
@property
def data(self):
"""Return the edge feature data.
"""Return a view of the edge features for the edges in the batch.
Returns
-------
dict with str keys and tensor values
Features of the edges.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set an edge feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.edata['h'] = torch.tensor([[1.], [1.], [1.]])
>>> # Define a UDF that retrieves the feature 'h' for all edges
>>> def edge_udf(edges):
>>> # edges.data['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'data': edges.data['h']}
>>> # Make a copy of the feature with name 'data'
>>> g.apply_edges(edge_udf)
>>> g.edata['data']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing, which is equivalent to dgl.function.copy_e
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('data', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
"""
return self._edge_data
def edges(self):
"""Return the edges contained in this batch.
"""Return the edges in the batch
Returns
-------
Tensor
Source node IDs.
Tensor
Destination node IDs.
Tensor
Edge IDs.
(U, V, EID) : (Tensor, Tensor, Tensor)
The edges in the batch. For each :math:`i`, :math:`(U[i], V[i])` is an edge
from :math:`U[i]` to :math:`V[i]` with ID :math:`EID[i]`.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> # Define a UDF that retrieves and concatenates the end nodes of the edges
>>> def edge_udf(edges):
>>> src, dst, _ = edges.edges()
>>> return {'uv': torch.stack([src, dst], dim=1).float()}
>>> # Create a feature 'uv' with the end nodes of the edges
>>> g.apply_edges(edge_udf)
>>> g.edata['uv']
tensor([[0., 1.],
[1., 1.],
[1., 0.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('uv', 'h'))
>>> g.ndata['h']
tensor([[1., 0.],
[1., 2.]])
"""
u, v = self._graph.find_edges(self._eid, etype=self.canonical_etype)
return u, v, self._eid
def batch_size(self):
"""Return the number of edges in this edge batch.
"""Return the number of edges in the batch.
Returns
-------
int
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> # Define a UDF that returns one for each edge
>>> def edge_udf(edges):
>>> return {'h': torch.ones(edges.batch_size(), 1)}
>>> # Creates a feature 'h'
>>> g.apply_edges(edge_udf)
>>> g.edata['h']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('h', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
"""
return len(self._eid)
......@@ -124,45 +256,137 @@ class NodeBatch(object):
@property
def data(self):
"""Return the node feature data.
"""Return a view of the node features for the nodes in the batch.
Returns
-------
dict with str keys and tensor values
Features of the nodes.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original feature
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.data['h'] is a tensor of shape (N, 1),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.data['h'] + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
"""
return self._data
@property
def mailbox(self):
"""Return the received messages.
"""Return a view of the messages received.
If no messages received, a ``None`` will be returned.
Examples
--------
The following example uses PyTorch backend.
Returns
-------
dict or None
The messages nodes received. If dict, the keys are
``str`` and the values are ``tensor``.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original feature
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.data['h'] is a tensor of shape (N, 1),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.data['h'] + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
"""
return self._msgs
def nodes(self):
"""Return the nodes contained in this batch.
"""Return the nodes in the batch.
Returns
-------
tensor
The nodes.
NID : Tensor
The IDs of the nodes in the batch. :math:`NID[i]` gives the ID of
the i-th node.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original ID
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.nodes() is a tensor of shape (N),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.nodes().unsqueeze(-1).float() + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[1.],
[3.]])
"""
return self._nodes
def batch_size(self):
"""Return the number of nodes in this batch.
"""Return the number of nodes in the batch.
Returns
-------
int
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received for each node
>>> # and increments the result by 1
>>> def node_udf(nodes):
>>> return {'h': torch.ones(nodes.batch_size(), 1) + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
"""
return len(self._nodes)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment