"graphbolt/vscode:/vscode.git/clone" did not exist on "219c9f1a2910a1f5d22f56fede28252d55eee277"
L0_neighbor_sampling_overview.py 4.04 KB
Newer Older
1
2
3
4
"""
Introduction of Neighbor Sampling for GNN Training
==================================================

5
In :doc:`previous tutorials <../blitz/1_introduction>` you have learned how to
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
train GNNs by computing the representations of all nodes on a graph.
However, sometimes your graph is too large to fit the computation of all
nodes in a single GPU.

By the end of this tutorial, you will be able to

-  Understand the pipeline of stochastic GNN training.
-  Understand what is neighbor sampling and why it yields a bipartite
   graph for each GNN layer.
"""


######################################################################
# Message Passing Review
# ----------------------
#
# Recall that in `Gilmer et al. <https://arxiv.org/abs/1704.01212>`__
23
# (also in :doc:`message passing tutorial <../blitz/3_message_passing>`), the
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# message passing formulation is as follows:
#
# .. math::
#
#
#    m_{u\to v}^{(l)} = M^{(l)}\left(h_v^{(l-1)}, h_u^{(l-1)}, e_{u\to v}^{(l-1)}\right)
#
# .. math::
#
#
#    m_{v}^{(l)} = \sum_{u\in\mathcal{N}(v)}m_{u\to v}^{(l)}
#
# .. math::
#
#
#    h_v^{(l)} = U^{(l)}\left(h_v^{(l-1)}, m_v^{(l)}\right)
#
# where DGL calls :math:`M^{(l)}` the *message function*, :math:`\sum` the
# *reduce function* and :math:`U^{(l)}` the *update function*. Note that
# :math:`\sum` here can represent any function and is not necessarily a
# summation.
#
# Essentially, the :math:`l`-th layer representation of a single node
# depends on the :math:`(l-1)`-th layer representation of the same node,
# as well as the :math:`(l-1)`-th layer representation of the neighboring
# nodes. Those :math:`(l-1)`-th layer representations then depend on the
# :math:`(l-2)`-th layer representation of those nodes, as well as their
# neighbors.
#
# The following animation shows how a 2-layer GNN is supposed to compute
# the output of node 5:
#
# |image1|
#
# You can see that to compute node 5 from the second layer, you will need
# its direct neighbors’ first layer representations (colored in yellow),
# which in turn needs their direct neighbors’ (i.e. node 5’s second-hop
# neighbors’) representations (colored in green).
#
# .. |image1| image:: https://data.dgl.ai/tutorial/img/sampling.gif
#


######################################################################
# Neighbor Sampling Overview
# --------------------------
#
# You can also see from the previous example that computing representation
# for a small number of nodes often requires input features of a
# significantly larger number of nodes. Taking all neighbors for message
# aggregation is often too costly since the nodes needed for input
# features would easily cover a large portion of the graph, especially for
# real-world graphs which are often
# `scale-free <https://en.wikipedia.org/wiki/Scale-free_network>`__.
#
# Neighbor sampling addresses this issue by selecting a subset of the
# neighbors to perform aggregation. For instance, to compute
# :math:`\boldsymbol{h}_8^{(2)}`, you can choose two of the neighbors
# instead of all of them to aggregate, as in the following animation:
#
# |image2|
#
# You can see that this method uses much fewer nodes needed in message
# passing for a single minibatch.
#
# .. |image2| image:: https://data.dgl.ai/tutorial/img/bipartite.gif
#


######################################################################
# You can also notice in the animation above that the computation
# dependencies in the animation above can be described as a series of
# *bipartite graphs*.
# The output nodes are on one side and all the nodes necessary for inputs
# are on the other side. The arrows indicate how the sampled neighbors
# propagates messages to the nodes.
#
# Note that some GNN modules, such as `SAGEConv`, need to use the output
# nodes' features on the previous layer to compute the outputs.  Without
# loss of generality, DGL always includes the output nodes themselves
# in the input nodes.
#


######################################################################
# What’s next?
# ------------
#
# :doc:`Stochastic GNN Training for Node Classification in
# DGL <L1_large_node_classification>`
#