"vscode:/vscode.git/clone" did not exist on "e8f7600a69ea0e949c39148641f8202ad1e3ba63"
minibatch.rst 2.35 KB
Newer Older
1
2
.. _guide-minibatch:

3
4
Chapter 6: Stochastic Training on Large Graphs
=======================================================
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

If we have a massive graph with, say, millions or even billions of nodes
or edges, usually full-graph training as described in
:ref:`guide-training`
would not work. Consider an :math:`L`-layer graph convolutional network
with hidden state size :math:`H` running on an :math:`N`-node graph.
Storing the intermediate hidden states requires :math:`O(NLH)` memory,
easily exceeding one GPU’s capacity with large :math:`N`.

This section provides a way to perform stochastic minibatch training,
where we do not have to fit the feature of all the nodes into GPU.

Overview of Neighborhood Sampling Approaches
--------------------------------------------

Neighborhood sampling methods generally work as the following. For each
gradient descent step, we select a minibatch of nodes whose final
representations at the :math:`L`-th layer are to be computed. We then
take all or some of their neighbors at the :math:`L-1` layer. This
process continues until we reach the input. This iterative process
builds the dependency graph starting from the output and working
backwards to the input, as the figure below shows:

.. figure:: https://i.imgur.com/Y0z0qcC.png
   :alt: Imgur

   Imgur

With this, one can save the workload and computation resources for
training a GNN on a large graph.

DGL provides a few neighborhood samplers and a pipeline for training a
GNN with neighborhood sampling, as well as ways to customize your
sampling strategies.

40
41
Roadmap
-----------
42

43
44
The chapter starts with sections for training GNNs stochastically under
different scenarios.
45

46
47
48
* :ref:`guide-minibatch-node-classification-sampler`
* :ref:`guide-minibatch-edge-classification-sampler`
* :ref:`guide-minibatch-link-classification-sampler`
49

50
51
52
53
The remaining sections cover more advanced topics, suitable for those who
wish to develop new sampling algorithms, new GNN modules compatible with
mini-batch training and understand how evaluation and inference can be
conducted in mini-batches.
54

55
56
57
* :ref:`guide-minibatch-customizing-neighborhood-sampler`
* :ref:`guide-minibatch-custom-gnn-module`
* :ref:`guide-minibatch-inference`
58
59


60
61
62
63
.. toctree::
    :maxdepth: 1
    :hidden:
    :glob:
64

65
66
67
68
69
70
    minibatch-node
    minibatch-edge
    minibatch-link
    minibatch-custom-sampler
    minibatch-nn
    minibatch-inference