minibatch.rst 2.4 KB
Newer Older
1
2
.. _guide-minibatch:

3
4
Chapter 6: Stochastic Training on Large Graphs
=======================================================
5

6
7
:ref:`(中文版) <guide_cn-minibatch>`

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
If we have a massive graph with, say, millions or even billions of nodes
or edges, usually full-graph training as described in
:ref:`guide-training`
would not work. Consider an :math:`L`-layer graph convolutional network
with hidden state size :math:`H` running on an :math:`N`-node graph.
Storing the intermediate hidden states requires :math:`O(NLH)` memory,
easily exceeding one GPU’s capacity with large :math:`N`.

This section provides a way to perform stochastic minibatch training,
where we do not have to fit the feature of all the nodes into GPU.

Overview of Neighborhood Sampling Approaches
--------------------------------------------

Neighborhood sampling methods generally work as the following. For each
gradient descent step, we select a minibatch of nodes whose final
representations at the :math:`L`-th layer are to be computed. We then
take all or some of their neighbors at the :math:`L-1` layer. This
process continues until we reach the input. This iterative process
builds the dependency graph starting from the output and working
backwards to the input, as the figure below shows:

Jinjing Zhou's avatar
Jinjing Zhou committed
30
.. figure:: https://data.dgl.ai/asset/image/guide_6_0_0.png
31
32
   :alt: Imgur

Jinjing Zhou's avatar
Jinjing Zhou committed
33

34
35
36
37
38
39
40
41

With this, one can save the workload and computation resources for
training a GNN on a large graph.

DGL provides a few neighborhood samplers and a pipeline for training a
GNN with neighborhood sampling, as well as ways to customize your
sampling strategies.

42
43
Roadmap
-----------
44

45
46
The chapter starts with sections for training GNNs stochastically under
different scenarios.
47

48
49
50
* :ref:`guide-minibatch-node-classification-sampler`
* :ref:`guide-minibatch-edge-classification-sampler`
* :ref:`guide-minibatch-link-classification-sampler`
51

52
53
54
55
The remaining sections cover more advanced topics, suitable for those who
wish to develop new sampling algorithms, new GNN modules compatible with
mini-batch training and understand how evaluation and inference can be
conducted in mini-batches.
56

57
58
59
* :ref:`guide-minibatch-customizing-neighborhood-sampler`
* :ref:`guide-minibatch-custom-gnn-module`
* :ref:`guide-minibatch-inference`
60
61


62
63
64
65
.. toctree::
    :maxdepth: 1
    :hidden:
    :glob:
66

67
68
69
70
71
72
    minibatch-node
    minibatch-edge
    minibatch-link
    minibatch-custom-sampler
    minibatch-nn
    minibatch-inference