Unverified Commit 877101a9 authored by Mufei Li's avatar Mufei Li Committed by GitHub
Browse files

[Doc] Update Doc for UDFs (#2131)



* Update Doc for UDFs

* add degree bucketing explanation

* Update udf.rst

* Update
Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
parent 5e9d2889
.. _apiudf: .. _apiudf:
User-defined Function User-defined Functions
================================================== ==================================================
.. currentmodule:: dgl.udf .. currentmodule:: dgl.udf
User-defined functions (UDFs) are flexible ways to configure message passing computation. User-defined functions (UDFs) allow arbitrary computation in message passing
There are two types of UDFs in DGL: (see :ref:`guide-message-passing`) and edge feature update with
:func:`~dgl.DGLGraph.apply_edges`. They bring more flexibility when :ref:`apifunction`
cannot realize a desired computation.
* **Node UDF** of signature ``NodeBatch -> dict``. The argument represents Edge-wise User-defined Function
a batch of nodes. The returned dictionary should have ``str`` type key and ``tensor`` -------------------------------
type values.
* **Edge UDF** of signature ``EdgeBatch -> dict``. The argument represents
a batch of edges. The returned dictionary should have ``str`` type key and ``tensor``
type values.
The size of the batch dimension is determined by the DGL framework One can use an edge-wise user defined function for a message function in message passing or
for good efficiency and small memory footprint. Users should not make a function to apply in :func:`~dgl.DGLGraph.apply_edges`. It takes a batch of edges as input
assumption in the batch dimension. and returns messages (in message passing) or features (in :func:`~dgl.DGLGraph.apply_edges`)
for each edge. The function may combine the features of the edges and their end nodes in
computation.
EdgeBatch Formally, it takes the following form
---------
The class that can represent a batch of edges. .. code::
def edge_udf(edges):
"""
Parameters
----------
edges : EdgeBatch
A batch of edges.
Returns
-------
dict[str, tensor]
The messages or edge features generated. It maps a message/feature name to the
corresponding messages/features of all edges in the batch. The order of the
messages/features is the same as the order of the edges in the input argument.
"""
DGL generates :class:`~dgl.udf.EdgeBatch` instances internally, which expose the following
interface for defining ``edge_udf``.
.. autosummary:: .. autosummary::
:toctree: ../../generated/ :toctree: ../../generated/
...@@ -32,12 +49,33 @@ The class that can represent a batch of edges. ...@@ -32,12 +49,33 @@ The class that can represent a batch of edges.
EdgeBatch.data EdgeBatch.data
EdgeBatch.edges EdgeBatch.edges
EdgeBatch.batch_size EdgeBatch.batch_size
EdgeBatch.__len__
NodeBatch Node-wise User-defined Function
--------- -------------------------------
One can use a node-wise user defined function for a reduce function in message passing. It takes
a batch of nodes as input and returns the updated features for each node. It may combine the
current node features and the messages nodes received. Formally, it takes the following form
.. code::
def node_udf(nodes):
"""
Parameters
----------
nodes : NodeBatch
A batch of nodes.
Returns
-------
dict[str, tensor]
The updated node features. It maps a feature name to the corresponding features of
all nodes in the batch. The order of the nodes is the same as the order of the nodes
in the input argument.
"""
The class that can represent a batch of nodes. DGL generates :class:`~dgl.udf.NodeBatch` instances internally, which expose the following
interface for defining ``node_udf``.
.. autosummary:: .. autosummary::
:toctree: ../../generated/ :toctree: ../../generated/
...@@ -46,4 +84,33 @@ The class that can represent a batch of nodes. ...@@ -46,4 +84,33 @@ The class that can represent a batch of nodes.
NodeBatch.mailbox NodeBatch.mailbox
NodeBatch.nodes NodeBatch.nodes
NodeBatch.batch_size NodeBatch.batch_size
NodeBatch.__len__
Degree Bucketing for Message Passing with User Defined Functions
----------------------------------------------------------------
DGL employs a degree-bucketing mechanism for message passing with UDFs. It groups nodes with
a same in-degree and invokes message passing for each group of nodes. As a result, one shall
not make any assumptions about the batch size of :class:`~dgl.udf.NodeBatch` instances.
For a batch of nodes, DGL stacks the incoming messages of each node along the second dimension,
ordered by edge ID. An example goes as follows:
.. code:: python
>>> import dgl
>>> import torch
>>> import dgl.function as fn
>>> g = dgl.graph(([1, 3, 5, 0, 4, 2, 3, 3, 4, 5], [1, 1, 0, 0, 1, 2, 2, 0, 3, 3]))
>>> g.edata['eid'] = torch.arange(10)
>>> def reducer(nodes):
... print(nodes.mailbox['eid'])
... return {'n': nodes.mailbox['eid'].sum(1)}
>>> g.update_all(fn.copy_e('eid', 'eid'), reducer)
tensor([[5, 6],
[8, 9]])
tensor([[3, 7, 2],
[0, 1, 4]])
Essentially, node #2 and node #3 are grouped into one bucket with in-degree of 2, and node
#0 and node #1 are grouped into one bucket with in-degree of 3. Within each bucket, the
edges are ordered by the edge IDs for each node.
...@@ -29,58 +29,190 @@ class EdgeBatch(object): ...@@ -29,58 +29,190 @@ class EdgeBatch(object):
@property @property
def src(self): def src(self):
"""Return the feature data of the source nodes. """Return a view of the source node features for the edges in the batch.
Returns Examples
------- --------
dict with str keys and tensor values The following example uses PyTorch backend.
Features of the source nodes.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a node feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that retrieves the source node features for edges
>>> def edge_udf(edges):
>>> # edges.src['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'src': edges.src['h']}
>>> # Copy features from source nodes to edges
>>> g.apply_edges(edge_udf)
>>> g.edata['src']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing, which is equivalent to dgl.function.copy_u
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('src', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
""" """
return self._src_data return self._src_data
@property @property
def dst(self): def dst(self):
"""Return the feature data of the destination nodes. """Return a view of the destination node features for the edges in the batch.
Returns Examples
------- --------
dict with str keys and tensor values The following example uses PyTorch backend.
Features of the destination nodes.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a node feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.tensor([[0.], [1.]])
>>> # Define a UDF that retrieves the destination node features for edges
>>> def edge_udf(edges):
>>> # edges.dst['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'dst': edges.dst['h']}
>>> # Copy features from destination nodes to edges
>>> g.apply_edges(edge_udf)
>>> g.edata['dst']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('dst', 'h'))
>>> g.ndata['h']
tensor([[0.],
[2.]])
""" """
return self._dst_data return self._dst_data
@property @property
def data(self): def data(self):
"""Return the edge feature data. """Return a view of the edge features for the edges in the batch.
Returns Examples
------- --------
dict with str keys and tensor values The following example uses PyTorch backend.
Features of the edges.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set an edge feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.edata['h'] = torch.tensor([[1.], [1.], [1.]])
>>> # Define a UDF that retrieves the feature 'h' for all edges
>>> def edge_udf(edges):
>>> # edges.data['h'] is a tensor of shape (E, 1),
>>> # where E is the number of edges in the batch.
>>> return {'data': edges.data['h']}
>>> # Make a copy of the feature with name 'data'
>>> g.apply_edges(edge_udf)
>>> g.edata['data']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing, which is equivalent to dgl.function.copy_e
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('data', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
""" """
return self._edge_data return self._edge_data
def edges(self): def edges(self):
"""Return the edges contained in this batch. """Return the edges in the batch
Returns Returns
------- -------
Tensor (U, V, EID) : (Tensor, Tensor, Tensor)
Source node IDs. The edges in the batch. For each :math:`i`, :math:`(U[i], V[i])` is an edge
Tensor from :math:`U[i]` to :math:`V[i]` with ID :math:`EID[i]`.
Destination node IDs.
Tensor Examples
Edge IDs. --------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> # Define a UDF that retrieves and concatenates the end nodes of the edges
>>> def edge_udf(edges):
>>> src, dst, _ = edges.edges()
>>> return {'uv': torch.stack([src, dst], dim=1).float()}
>>> # Create a feature 'uv' with the end nodes of the edges
>>> g.apply_edges(edge_udf)
>>> g.edata['uv']
tensor([[0., 1.],
[1., 1.],
[1., 0.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('uv', 'h'))
>>> g.ndata['h']
tensor([[1., 0.],
[1., 2.]])
""" """
u, v = self._graph.find_edges(self._eid, etype=self.canonical_etype) u, v = self._graph.find_edges(self._eid, etype=self.canonical_etype)
return u, v, self._eid return u, v, self._eid
def batch_size(self): def batch_size(self):
"""Return the number of edges in this edge batch. """Return the number of edges in the batch.
Returns Returns
------- -------
int int
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> # Define a UDF that returns one for each edge
>>> def edge_udf(edges):
>>> return {'h': torch.ones(edges.batch_size(), 1)}
>>> # Creates a feature 'h'
>>> g.apply_edges(edge_udf)
>>> g.edata['h']
tensor([[1.],
[1.],
[1.]])
>>> # Use edge UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(edge_udf, fn.sum('h', 'h'))
>>> g.ndata['h']
tensor([[1.],
[2.]])
""" """
return len(self._eid) return len(self._eid)
...@@ -124,45 +256,137 @@ class NodeBatch(object): ...@@ -124,45 +256,137 @@ class NodeBatch(object):
@property @property
def data(self): def data(self):
"""Return the node feature data. """Return a view of the node features for the nodes in the batch.
Returns Examples
------- --------
dict with str keys and tensor values The following example uses PyTorch backend.
Features of the nodes.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original feature
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.data['h'] is a tensor of shape (N, 1),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.data['h'] + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
""" """
return self._data return self._data
@property @property
def mailbox(self): def mailbox(self):
"""Return the received messages. """Return a view of the messages received.
If no messages received, a ``None`` will be returned. Examples
--------
The following example uses PyTorch backend.
Returns >>> import dgl
------- >>> import torch
dict or None
The messages nodes received. If dict, the keys are >>> # Instantiate a graph and set a feature 'h'
``str`` and the values are ``tensor``. >>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original feature
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.data['h'] is a tensor of shape (N, 1),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.data['h'] + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
""" """
return self._msgs return self._msgs
def nodes(self): def nodes(self):
"""Return the nodes contained in this batch. """Return the nodes in the batch.
Returns Returns
------- -------
tensor NID : Tensor
The nodes. The IDs of the nodes in the batch. :math:`NID[i]` gives the ID of
the i-th node.
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph and set a feature 'h'
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received and the original ID
>>> # for each node
>>> def node_udf(nodes):
>>> # nodes.nodes() is a tensor of shape (N),
>>> # nodes.mailbox['m'] is a tensor of shape (N, D, 1),
>>> # where N is the number of nodes in the batch,
>>> # D is the number of messages received per node for this node batch
>>> return {'h': nodes.nodes().unsqueeze(-1).float() + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[1.],
[3.]])
""" """
return self._nodes return self._nodes
def batch_size(self): def batch_size(self):
"""Return the number of nodes in this batch. """Return the number of nodes in the batch.
Returns Returns
------- -------
int int
Examples
--------
The following example uses PyTorch backend.
>>> import dgl
>>> import torch
>>> # Instantiate a graph
>>> g = dgl.graph((torch.tensor([0, 1, 1]), torch.tensor([1, 1, 0])))
>>> g.ndata['h'] = torch.ones(2, 1)
>>> # Define a UDF that computes the sum of the messages received for each node
>>> # and increments the result by 1
>>> def node_udf(nodes):
>>> return {'h': torch.ones(nodes.batch_size(), 1) + nodes.mailbox['m'].sum(1)}
>>> # Use node UDF in message passing
>>> import dgl.function as fn
>>> g.update_all(fn.copy_u('h', 'm'), node_udf)
>>> g.ndata['h']
tensor([[2.],
[3.]])
""" """
return len(self._nodes) return len(self._nodes)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment