Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dgl
Commits
dd09f15f
Commit
dd09f15f
authored
Nov 29, 2019
by
John Andrilla
Committed by
Minjie Wang
Nov 30, 2019
Browse files
[Doc] Stochastic steady-state embedding (SSE) Edit for readability
Revisions for grammar and style
parent
0897548a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
64 additions
and
64 deletions
+64
-64
tutorials/models/1_gnn/8_sse_mx.py
tutorials/models/1_gnn/8_sse_mx.py
+64
-64
No files found.
tutorials/models/1_gnn/8_sse_mx.py
View file @
dd09f15f
"""
"""
.. _model-sse:
.. _model-sse:
Stochastic
S
teady-state
E
mbedding (SSE)
Stochastic
s
teady-state
e
mbedding (SSE)
=======================================
=======================================
**Author**: Gai Yu, Da Zheng, Quan Gan, Jinjing Zhou, Zheng Zhang
**Author**: Gai Yu, Da Zheng, Quan Gan, Jinjing Zhou, Zheng Zhang
...
@@ -20,15 +20,15 @@ Stochastic Steady-state Embedding (SSE)
...
@@ -20,15 +20,15 @@ Stochastic Steady-state Embedding (SSE)
# \newcommand{\calv}{{\mathcal{V}}}
# \newcommand{\calv}{{\mathcal{V}}}
# \newcommand{\until}{\text{until}\ }
# \newcommand{\until}{\text{until}\ }
#
#
# In this tutorial
we implement in
DGL with MXNet
# In this tutorial
, you learn how to use the Deep Graph Library (
DGL
)
with MXNet
to implement the following:
#
#
# - Simple steady-state algorithms with `stochastic steady-state
# - Simple
,
steady-state algorithms with `stochastic steady-state
# embedding <https://www.cc.gatech.edu/~hdai8/pdf/equilibrium_embedding.pdf>`__
# embedding <https://www.cc.gatech.edu/~hdai8/pdf/equilibrium_embedding.pdf>`__
# (SSE)
, and
# (SSE)
# - Training with subgraph sampling
.
# - Training with subgraph sampling
#
#
# Subgraph sampling is a
generic
technique to scale
up learning to
# Subgraph sampling is a technique to scale
-
up learning to
# gigantic graphs (
e.g. with
billions of nodes and edges).
It
can apply to
# gigantic graphs (
for example,
billions of nodes and edges).
Subgraph sampling
can apply to
# other algorithms, such as :doc:`Graph convolution
# other algorithms, such as :doc:`Graph convolution
# network <1_gcn>`
# network <1_gcn>`
# and :doc:`Relational graph convolution
# and :doc:`Relational graph convolution
...
@@ -38,19 +38,19 @@ Stochastic Steady-state Embedding (SSE)
...
@@ -38,19 +38,19 @@ Stochastic Steady-state Embedding (SSE)
# -----------------------
# -----------------------
#
#
# Many algorithms for graph analytics are iterative procedures that
# Many algorithms for graph analytics are iterative procedures that
#
terminate
when
some
steady state
s are
reached. Examples include
#
end
when
a
steady state
is
reached. Examples include
# PageRank
, and
mean-field inference on Markov
R
andom
F
ields.
# PageRank
or
mean-field inference on Markov
r
andom
f
ields.
#
#
# Flood-fill algorithm
# Flood-fill algorithm
# ~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~
#
#
# *Flood-fill algorithm* (or *infection* algorithm
as in Dai et al.
) can
#
A
*Flood-fill algorithm* (or *infection* algorithm) can
# also be seen as
such
a procedure. Specifically, the problem is that
# also be seen as a procedure. Specifically, the problem is that
# given a graph :math:`\calg = (\calv, \cale)` and a source node
# given a graph :math:`\calg = (\calv, \cale)` and a source node
# :math:`s \in \calv`,
we
need to mark all nodes that can be reached from
# :math:`s \in \calv`,
you
need to mark all nodes that can be reached from
# :math:`s`. Let :math:`\calv = \{1, ..., n\}` and let :math:`y_v`
# :math:`s`. Let :math:`\calv = \{1, ..., n\}` and let :math:`y_v`
# indicate whether a node :math:`v` is marked. The flood-fill algorithm
# indicate whether a node :math:`v` is marked. The flood-fill algorithm
# proceeds as follows
:
# proceeds as follows
.
#
#
# .. math::
# .. math::
#
#
...
@@ -67,7 +67,7 @@ Stochastic Steady-state Embedding (SSE)
...
@@ -67,7 +67,7 @@ Stochastic Steady-state Embedding (SSE)
#
#
# The flood-fill algorithm first marks the source node :math:`s`, and then
# The flood-fill algorithm first marks the source node :math:`s`, and then
# repeatedly marks nodes with one or more marked neighbors until no node
# repeatedly marks nodes with one or more marked neighbors until no node
# needs to be marked,
i.e.
the steady state is reached.
# needs to be marked,
that is,
the steady state is reached.
#
#
# Flood-fill algorithm and steady-state operator
# Flood-fill algorithm and steady-state operator
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
@@ -80,14 +80,14 @@ Stochastic Steady-state Embedding (SSE)
...
@@ -80,14 +80,14 @@ Stochastic Steady-state Embedding (SSE)
# In the case of the flood-fill algorithm, :math:`\hat\calt = \max`. The
# In the case of the flood-fill algorithm, :math:`\hat\calt = \max`. The
# condition “:math:`\until \bfy^{(t + 1)} = \bfy^{(t)}`” in :math:`(3)`
# condition “:math:`\until \bfy^{(t + 1)} = \bfy^{(t)}`” in :math:`(3)`
# implies that :math:`\bfy^*` is the solution to the problem if and only
# implies that :math:`\bfy^*` is the solution to the problem if and only
# if :math:`\bfy^* = \calt (\bfy^*)`,
i.e.
\ :math:`\bfy^*` is steady
# if :math:`\bfy^* = \calt (\bfy^*)`,
that is
\ :math:`\bfy^*` is steady
# under :math:`\calt`. Thus we call :math:`\calt` the *steady-state
# under :math:`\calt`. Thus we call :math:`\calt` the *steady-state
# operator*.
# operator*.
#
#
# Implement
ation
# Implement
ing a flood-fill algorithm
# ~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
#
#
#
We
can
easily
implement flood-fill in DGL
:
#
You
can implement flood-fill in DGL
with the following code.
import
mxnet
as
mx
import
mxnet
as
mx
import
os
import
os
...
@@ -108,8 +108,8 @@ def T(g):
...
@@ -108,8 +108,8 @@ def T(g):
return
g
.
ndata
[
'y'
]
return
g
.
ndata
[
'y'
]
##############################################################################
##############################################################################
# To run the algorithm,
let’s
create a ``DGLGraph`` consisting of two
# To run the algorithm, create a ``DGLGraph``
as in the example code here,
consisting of two
# disjoint chains, each with
10
nodes, and initialize it as specified in
# disjoint
ed
chains, each with
ten
nodes, and initialize it as specified in
# Eq. :math:`(0)` and Eq. :math:`(1)`.
# Eq. :math:`(0)` and Eq. :math:`(1)`.
#
#
import
networkx
as
nx
import
networkx
as
nx
...
@@ -134,8 +134,8 @@ y[s] = 1
...
@@ -134,8 +134,8 @@ y[s] = 1
g
.
ndata
[
'y'
]
=
y
g
.
ndata
[
'y'
]
=
y
##############################################################################
##############################################################################
# Now
let’s
apply ``T`` to ``g`` until convergence. You can see that nodes
# Now apply ``T`` to ``g`` until convergence. You can see that nodes
# reachable from ``s`` are gradually
“
infected
”
(marked).
# reachable from ``s`` are gradually infected (marked).
#
#
while
True
:
while
True
:
prev_y
=
g
.
ndata
[
'y'
]
prev_y
=
g
.
ndata
[
'y'
]
...
@@ -154,22 +154,22 @@ while True:
...
@@ -154,22 +154,22 @@ while True:
# Neural flood-fill algorithm
# Neural flood-fill algorithm
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#
# N
ow let’s consider
design
ing
a neural network that simulates the
# N
ext, you can
design a neural network that simulates the
# flood-fill algorithm.
# flood-fill algorithm.
#
#
#
-
Instead of using :math:`\calt` to update the states of nodes,
we
use
# Instead of using :math:`\calt` to update the states of nodes, use
# :math:`\calt_\Theta`, a graph neural network (and
# :math:`\calt_\Theta`, a graph neural network (and
# :math:`\hat\calt_\Theta` instead of :math:`\hat\calt`).
# :math:`\hat\calt_\Theta` instead of :math:`\hat\calt`).
#
-
The state of a node :math:`v` is no longer a
b
oolean value
# The state of a node :math:`v` is no longer a
B
oolean value
# (:math:`y_v`), but, an embedding :math:`h_v` (a vector of some
# (:math:`y_v`), but, an embedding :math:`h_v` (a vector of some
# reasonable dimension, say, :math:`H`).
# reasonable dimension, say, :math:`H`).
#
- We
also associate a feature vector :math:`x_v` with :math:`v`. For
#
You can
also associate a feature vector :math:`x_v` with :math:`v`. For
# the flood-fill algorithm,
we
simply use the one-hot encoding of a
# the flood-fill algorithm, simply use the one-hot encoding of a
# node’s ID as its feature vector, so that our algorithm can
# node’s ID as its feature vector, so that our algorithm can
# distinguish different nodes.
# distinguish different nodes.
#
- We o
nly iterate :math:`T` times instead of iterating until the
#
O
nly iterate :math:`T` times instead of iterating until the
# steady-state condition is satisfied.
# steady-state condition is satisfied.
#
-
After iteration,
we
mark the nodes by passing the node embedding
# After iteration, mark the nodes by passing the node embedding
# :math:`h_v` into another neural network to produce a probability
# :math:`h_v` into another neural network to produce a probability
# :math:`p_v` of whether the node is reachable.
# :math:`p_v` of whether the node is reachable.
#
#
...
@@ -198,7 +198,7 @@ while True:
...
@@ -198,7 +198,7 @@ while True:
# Like the naive algorithm, the neural flood-fill algorithm can be
# Like the naive algorithm, the neural flood-fill algorithm can be
# partitioned into a ``message_func`` (neighborhood information gathering)
# partitioned into a ``message_func`` (neighborhood information gathering)
# and a ``reduce_func`` (:math:`\hat\calt_\Theta`). We define
# and a ``reduce_func`` (:math:`\hat\calt_\Theta`). We define
# :math:`\hat\calt_\Theta` as a callable ``gluon.Block``
:
# :math:`\hat\calt_\Theta` as a callable ``gluon.Block``
as in this example code.
#
#
import
mxnet.gluon
as
gluon
import
mxnet.gluon
as
gluon
...
@@ -232,7 +232,7 @@ class FullGraphSteadyStateOperator(gluon.Block):
...
@@ -232,7 +232,7 @@ class FullGraphSteadyStateOperator(gluon.Block):
#
#
# h_v^{(t + 1)} \leftarrow (1 - \alpha) h_v^{(t)} + \alpha \left[\calt_\Theta (h_0^{(t)}, ..., h_n^{(t)})\right]_v \qquad 0 < \alpha < 1
# h_v^{(t + 1)} \leftarrow (1 - \alpha) h_v^{(t)} + \alpha \left[\calt_\Theta (h_0^{(t)}, ..., h_n^{(t)})\right]_v \qquad 0 < \alpha < 1
#
#
# Putting these together
we
have:
# Putting these together
you
have:
#
#
def
update_embeddings
(
g
,
steady_state_operator
):
def
update_embeddings
(
g
,
steady_state_operator
):
...
@@ -240,7 +240,7 @@ def update_embeddings(g, steady_state_operator):
...
@@ -240,7 +240,7 @@ def update_embeddings(g, steady_state_operator):
next_h
=
steady_state_operator
(
g
)
next_h
=
steady_state_operator
(
g
)
g
.
ndata
[
'h'
]
=
(
1
-
alpha
)
*
prev_h
+
alpha
*
next_h
g
.
ndata
[
'h'
]
=
(
1
-
alpha
)
*
prev_h
+
alpha
*
next_h
##############################################################################
##############################################################################
# The last step involves implementing the predictor
:
# The last step involves implementing the predictor
.
#
#
class
Predictor
(
gluon
.
Block
):
class
Predictor
(
gluon
.
Block
):
def
__init__
(
self
,
n_hidden
,
activation
,
**
kwargs
):
def
__init__
(
self
,
n_hidden
,
activation
,
**
kwargs
):
...
@@ -254,7 +254,7 @@ class Predictor(gluon.Block):
...
@@ -254,7 +254,7 @@ class Predictor(gluon.Block):
##############################################################################
##############################################################################
# The predictor’s decision rule is just a decision rule for binary
# The predictor’s decision rule is just a decision rule for binary
# classification
:
# classification
.
#
#
# .. math::
# .. math::
#
#
...
@@ -274,12 +274,12 @@ class Predictor(gluon.Block):
...
@@ -274,12 +274,12 @@ class Predictor(gluon.Block):
# Efficient semi-supervised learning on graph
# Efficient semi-supervised learning on graph
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#
# In
our
setting,
we
can observe the entire structure of one fixed graph as well
# In
this
setting,
you
can observe the entire structure of one fixed graph as well
# as the feature vector of each node. However,
we
only have access to the
# as the feature vector of each node. However,
you might
only have access to the
# labels of some (very few) of the nodes.
We will t
rain the neural
# labels of some (very few) of the nodes.
T
rain the neural
# flood-fill algorithm in this setting as well.
# flood-fill algorithm in this setting as well.
#
#
#
We i
nitialize feature vectors ``'x'`` and node embeddings ``'h'``
#
I
nitialize feature vectors ``'x'`` and node embeddings ``'h'``
# first.
# first.
#
#
import
numpy
as
np
import
numpy
as
np
...
@@ -312,7 +312,7 @@ nodes_test = np.where(test_bitmap)[0]
...
@@ -312,7 +312,7 @@ nodes_test = np.where(test_bitmap)[0]
# :math:`T` times. These updated node embeddings are fed to :math:`g_\Phi`
# :math:`T` times. These updated node embeddings are fed to :math:`g_\Phi`
# as in Eq. :math:`(5)`. These steps are fully differentiable and the
# as in Eq. :math:`(5)`. These steps are fully differentiable and the
# neural flood-fill algorithm can thus be trained in an end-to-end
# neural flood-fill algorithm can thus be trained in an end-to-end
# fashion. Denoting the binary cross-entropy loss by :math:`l`,
we
have a
# fashion. Denoting the binary cross-entropy loss by :math:`l`,
you
have a
# loss function in the following form:
# loss function in the following form:
#
#
# .. math::
# .. math::
...
@@ -320,14 +320,14 @@ nodes_test = np.where(test_bitmap)[0]
...
@@ -320,14 +320,14 @@ nodes_test = np.where(test_bitmap)[0]
#
#
# \call (\Theta, \Phi) = \frac1{\left|\calv_y\right|} \sum_{v \in \calv_y} l \left(g_\Phi \left(\left[\calt_\Theta^T (h_1^{(0)}, ..., h_n^{(0)})\right]_v \right), y_v\right) \tag{7}
# \call (\Theta, \Phi) = \frac1{\left|\calv_y\right|} \sum_{v \in \calv_y} l \left(g_\Phi \left(\left[\calt_\Theta^T (h_1^{(0)}, ..., h_n^{(0)})\right]_v \right), y_v\right) \tag{7}
#
#
# After computing :math:`\call (\Theta, \Phi)`,
we
can update
# After computing :math:`\call (\Theta, \Phi)`,
you
can update
# :math:`\Theta` and :math:`\Phi` using the gradients
# :math:`\Theta` and :math:`\Phi` using the gradients
# :math:`\nabla_\Theta \call (\Theta, \Phi)` and
# :math:`\nabla_\Theta \call (\Theta, \Phi)` and
# :math:`\nabla_\Phi \call (\Theta, \Phi)`. One problem with Eq.
# :math:`\nabla_\Phi \call (\Theta, \Phi)`. One problem with Eq.
# :math:`(7)` is that computing :math:`\nabla_\Theta \call (\Theta, \Phi)`
# :math:`(7)` is that computing :math:`\nabla_\Theta \call (\Theta, \Phi)`
# and :math:`\nabla_\Phi \call (\Theta, \Phi)` requires back-propagating
# and :math:`\nabla_\Phi \call (\Theta, \Phi)` requires back-propagating
# :math:`T` times through :math:`\calt_\Theta`, which may be slow in
# :math:`T` times through :math:`\calt_\Theta`, which may be slow in
# practice. So
we
adopt the following
“
steady-state
”
loss function, which
# practice. So
,
adopt the following steady-state loss function, which
# only incorporates the last node embedding update in back-propagation:
# only incorporates the last node embedding update in back-propagation:
#
#
# .. math::
# .. math::
...
@@ -350,17 +350,17 @@ def fullgraph_update_parameters(g, label_nodes, steady_state_operator, predictor
...
@@ -350,17 +350,17 @@ def fullgraph_update_parameters(g, label_nodes, steady_state_operator, predictor
trainer
.
step
(
n
)
# divide gradients by the number of labelled nodes
trainer
.
step
(
n
)
# divide gradients by the number of labelled nodes
return
loss
.
asnumpy
()[
0
]
return
loss
.
asnumpy
()[
0
]
##############################################################################
##############################################################################
#
We
are now ready to implement the training procedure, which is in two
#
You
are now ready to implement the training procedure, which is in two
# phases
:
# phases
.
#
#
# - The first phase updates node embeddings several times using
# - The first phase updates node embeddings several times using
# :math:`\calt_\Theta` to attain an approximately steady state
# :math:`\calt_\Theta` to attain an approximately steady state
# - The second phase trains :math:`\calt_\Theta` and :math:`g_\Phi` using
# - The second phase trains :math:`\calt_\Theta` and :math:`g_\Phi` using
# this steady state.
# this steady state.
#
#
#
Note that we
update the node embeddings of :math:`\calg` instead of
#
You
update the node embeddings of :math:`\calg` instead of
# :math:`\calg_y` only. The reason lies in the semi-supervised learning
# :math:`\calg_y` only. The reason lies in the semi-supervised learning
# setting
: t
o do inference on :math:`\calg`,
we
need node embeddings on
# setting
. T
o do inference on :math:`\calg`,
you
need node embeddings on
# :math:`\calg` instead of on :math:`\calg_y` only.
# :math:`\calg` instead of on :math:`\calg_y` only.
#
#
def
train
(
g
,
label_nodes
,
steady_state_operator
,
predictor
,
trainer
):
def
train
(
g
,
label_nodes
,
steady_state_operator
,
predictor
,
trainer
):
...
@@ -373,17 +373,17 @@ def train(g, label_nodes, steady_state_operator, predictor, trainer):
...
@@ -373,17 +373,17 @@ def train(g, label_nodes, steady_state_operator, predictor, trainer):
predictor
,
trainer
)
predictor
,
trainer
)
return
loss
return
loss
##############################################################################
##############################################################################
# Scaling up with
S
tochastic
S
ubgraph
T
raining
# Scaling up with
s
tochastic
s
ubgraph
t
raining
# --------------------------------------------
# --------------------------------------------
#
#
# The computation time per update is linear to the number of edges in a
# The computation time per update is linear to the number of edges in a
# graph. If we have a gigantic graph with billions of nodes and edges, the
# graph. If we have a gigantic graph with billions of nodes and edges, the
# update function would be inefficient.
# update function would be inefficient.
#
#
# A possible improvement draws analogy from minibatch training on large
# A possible improvement draws
an
analogy from mini
-
batch training on large
# datasets
: i
nstead of computing gradients on the entire graph,
we
only
# datasets
. I
nstead of computing gradients on the entire graph, only
# consider some subgraphs randomly sampled from the labelled nodes.
# consider some subgraphs randomly sampled from the labelled nodes.
# Mathematically,
we
have the following loss function:
# Mathematically,
you
have the following loss function:
#
#
# .. math::
# .. math::
#
#
...
@@ -393,25 +393,25 @@ def train(g, label_nodes, steady_state_operator, predictor, trainer):
...
@@ -393,25 +393,25 @@ def train(g, label_nodes, steady_state_operator, predictor, trainer):
# where :math:`\calv_y^{(k)}` is the subset sampled for iteration
# where :math:`\calv_y^{(k)}` is the subset sampled for iteration
# :math:`k`.
# :math:`k`.
#
#
# In this training procedure,
we
also update node embeddings only on
# In this training procedure,
you
also update node embeddings only on
# sampled subgraphs, which is perhaps not surprising if you know
# sampled subgraphs, which is perhaps not surprising if you know
# stochastic fixed-point iteration.
# stochastic fixed-point iteration.
#
#
# Neighbor sampling
# Neighbor sampling
# ~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~
#
#
#
We
use *neighbor sampling* as
our
subgraph sampling strategy. Neighbor
#
You can
use *neighbor sampling* as
a
subgraph sampling strategy. Neighbor
# sampling traverses small neighborhoods from seed nodes with
BFS
. For
# sampling traverses small neighborhoods from seed nodes with
breadth first search
. For
# each newly sampled node, a small subset of neighboring nodes are sampled
# each newly sampled node, a small subset of neighboring nodes are sampled
# and added to the subgraph along with the connecting edges, unless the
# and added to the subgraph along with the connecting edges, unless the
# node reaches the maximum of :math:`k` hops from the seeding node.
# node reaches the maximum of :math:`k` hops from the seeding node.
#
#
# The following shows neighbor sampling with
2
seed nodes at a time, a
# The following shows neighbor sampling with
two
seed nodes at a time, a
# maximum of
2
hops, and a maximum of
3
neighboring nodes.
# maximum of
two
hops, and a maximum of
three
neighboring nodes.
#
#
# |image1|
# |image1|
#
#
# DGL supports very efficient subgraph sampling natively
to
help users
# DGL supports very efficient subgraph sampling natively
. This
help
s
users
# scale algorithms to large graphs. Currently, DGL provides the
# scale algorithms to large graphs. Currently, DGL provides the
# :func:`~dgl.contrib.sampling.sampler.NeighborSampler`
# :func:`~dgl.contrib.sampling.sampler.NeighborSampler`
# API, which returns a subgraph iterator that samples multiple subgraphs
# API, which returns a subgraph iterator that samples multiple subgraphs
...
@@ -429,7 +429,7 @@ for subg in sampler:
...
@@ -429,7 +429,7 @@ for subg in sampler:
seeds
.
append
(
subg
.
layer_parent_nid
(
-
1
))
seeds
.
append
(
subg
.
layer_parent_nid
(
-
1
))
##############################################################################
##############################################################################
# Sample
r
with DGL
# Sample
training
with DGL
# ~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~
#
#
# The code illustrates the training process in mini-batches.
# The code illustrates the training process in mini-batches.
...
@@ -512,7 +512,7 @@ def train_on_subgraphs(g, label_nodes, batch_size,
...
@@ -512,7 +512,7 @@ def train_on_subgraphs(g, label_nodes, batch_size,
return
loss
return
loss
##############################################################################
##############################################################################
#
We
also define a helper function that reports prediction accuracy
:
#
You can
also define a helper function that reports prediction accuracy
.
def
test
(
g
,
test_nodes
,
predictor
):
def
test
(
g
,
test_nodes
,
predictor
):
z
=
predictor
(
g
.
ndata
[
'h'
][
test_nodes
])
z
=
predictor
(
g
.
ndata
[
'h'
][
test_nodes
])
...
@@ -522,7 +522,7 @@ def test(g, test_nodes, predictor):
...
@@ -522,7 +522,7 @@ def test(g, test_nodes, predictor):
return
accuracy
.
asnumpy
()[
0
],
z
return
accuracy
.
asnumpy
()[
0
],
z
##############################################################################
##############################################################################
# Some routine preparations for training
:
# Some routine preparations for training
.
#
#
lr
=
1e-3
lr
=
1e-3
activation
=
'relu'
activation
=
'relu'
...
@@ -536,8 +536,8 @@ params.update(predictor.collect_params())
...
@@ -536,8 +536,8 @@ params.update(predictor.collect_params())
trainer
=
gluon
.
Trainer
(
params
,
'adam'
,
{
'learning_rate'
:
lr
})
trainer
=
gluon
.
Trainer
(
params
,
'adam'
,
{
'learning_rate'
:
lr
})
##############################################################################
##############################################################################
# Now
let’s
train it
!
As before, nodes reachable from :math:`s` are
# Now train it
.
As before, nodes reachable from :math:`s` are
# gradually
“
infected
”
, except that
beh
in
d
the
scene
is a neural network
!
# gradually infected, except that in the
background
is a neural network
.
#
#
n_epochs
=
35
n_epochs
=
35
n_embedding_updates
=
8
n_embedding_updates
=
8
...
@@ -559,13 +559,13 @@ for i in range(n_epochs):
...
@@ -559,13 +559,13 @@ for i in range(n_epochs):
##############################################################################
##############################################################################
# |image2|
# |image2|
#
#
# In this tutorial,
we
use a very small
toy
graph to demonstrate the
# In this tutorial,
you
use
d
a very small
example
graph to demonstrate the
# subgraph training for easy visualization. Subgraph training actually
# subgraph training for easy visualization. Subgraph training actually
# helps u
s
scale to gigantic graphs. For instance,
we have successfully
# helps
yo
u scale to gigantic graphs. For instance,
# scal
ed
SSE to a graph with 50 million nodes and 150 million edges in a
# scal
ing
SSE to a graph with 50 million nodes and 150 million edges in a
# single P3.8x large instance and one epoch only takes about 160 seconds.
# single P3.8x large instance
,
and one epoch
,
only takes about 160 seconds.
#
#
#
See
full examples
`here
<https://github.com/dmlc/dgl/tree/master/examples/mxnet/sse>`_.
#
For
full examples
, see `Benchmark SSE on multi-GPUs
<https://github.com/dmlc/dgl/tree/master/examples/mxnet/sse>`_
on Github
.
#
#
# .. |image0| image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/img/floodfill-paths.gif
# .. |image0| image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/img/floodfill-paths.gif
# .. |image1| image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/img/neighbor-sampling.gif
# .. |image1| image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/img/neighbor-sampling.gif
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment