1_gcn.py 7.66 KB
Newer Older
1
2
3
4
5
6
7
8
9
"""
.. _model-gcn:

Graph Convolutional Network
====================================

**Author:** `Qi Huang <https://github.com/HQ01>`_, `Minjie Wang  <https://jermainewang.github.io/>`_,
Yu Gai, Quan Gan, Zheng Zhang

10
11
12
13
14
15
16
.. warning::

    The tutorial aims at gaining insights into the paper, with code as a mean
    of explanation. The implementation thus is NOT optimized for running
    efficiency. For recommended implementation, please refer to the `official
    examples <https://github.com/dmlc/dgl/tree/master/examples>`_.

17
This is a gentle introduction of using DGL to implement Graph Convolutional
brett koonce's avatar
brett koonce committed
18
Networks (Kipf & Welling et al., `Semi-Supervised Classification with Graph
Minjie Wang's avatar
Minjie Wang committed
19
Convolutional Networks <https://arxiv.org/pdf/1609.02907.pdf>`_). We explain
20
what is under the hood of the :class:`~dgl.nn.GraphConv` module.
Minjie Wang's avatar
Minjie Wang committed
21
22
The reader is expected to learn how to define a new GNN layer using DGL's
message passing APIs.
23
24
25
26
27
28
29
30
31
32
"""

###############################################################################
# Model Overview
# ------------------------------------------
# GCN from the perspective of message passing
# ```````````````````````````````````````````````
# We describe a layer of graph convolutional neural network from a message
# passing perspective; the math can be found `here <math_>`_.
# It boils down to the following step, for each node :math:`u`:
33
#
34
35
36
37
# 1) Aggregate neighbors' representations :math:`h_{v}` to produce an
# intermediate representation :math:`\hat{h}_u`.  2) Transform the aggregated
# representation :math:`\hat{h}_{u}` with a linear projection followed by a
# non-linearity: :math:`h_{u} = f(W_{u} \hat{h}_u)`.
38
#
Minjie Wang's avatar
Minjie Wang committed
39
40
# We will implement step 1 with DGL message passing, and step 2 by
# PyTorch ``nn.Module``.
41
#
42
43
44
45
46
47
48
49
50
# GCN implementation with DGL
# ``````````````````````````````````````````
# We first define the message and reduce function as usual.  Since the
# aggregation on a node :math:`u` only involves summing over the neighbors'
# representations :math:`h_v`, we can simply use builtin functions:

import torch as th
import torch.nn as nn
import torch.nn.functional as F
51
52
53

import dgl
import dgl.function as fn
54
55
from dgl import DGLGraph

56
57
gcn_msg = fn.copy_u(u="h", out="m")
gcn_reduce = fn.sum(msg="m", out="h")
58
59

###############################################################################
Minjie Wang's avatar
Minjie Wang committed
60
61
# We then proceed to define the GCNLayer module. A GCNLayer essentially performs
# message passing on all the nodes then applies a fully-connected layer.
62
63
64
65
66
67
#
# .. note::
#
#    This is showing how to implement a GCN from scratch.  DGL provides a more
#    efficient :class:`builtin GCN layer module <dgl.nn.pytorch.conv.GraphConv>`.
#
68

69

Minjie Wang's avatar
Minjie Wang committed
70
71
72
class GCNLayer(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(GCNLayer, self).__init__()
73
74
75
        self.linear = nn.Linear(in_feats, out_feats)

    def forward(self, g, feature):
Minjie Wang's avatar
Minjie Wang committed
76
77
78
79
        # Creating a local scope so that all the stored ndata and edata
        # (such as the `'h'` ndata below) are automatically popped out
        # when the scope exits.
        with g.local_scope():
80
            g.ndata["h"] = feature
Minjie Wang's avatar
Minjie Wang committed
81
            g.update_all(gcn_msg, gcn_reduce)
82
            h = g.ndata["h"]
Minjie Wang's avatar
Minjie Wang committed
83
            return self.linear(h)
84

85

86
87
88
89
90
###############################################################################
# The forward function is essentially the same as any other commonly seen NNs
# model in PyTorch.  We can initialize GCN like any ``nn.Module``. For example,
# let's define a simple neural network consisting of two GCN layers. Suppose we
# are training the classifier for the cora dataset (the input feature size is
Da Zheng's avatar
Da Zheng committed
91
# 1433 and the number of classes is 7). The last GCN layer computes node embeddings,
Minjie Wang's avatar
Minjie Wang committed
92
# so the last layer in general does not apply activation.
93

94

95
96
97
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
Minjie Wang's avatar
Minjie Wang committed
98
99
        self.layer1 = GCNLayer(1433, 16)
        self.layer2 = GCNLayer(16, 7)
100

101
    def forward(self, g, features):
Minjie Wang's avatar
Minjie Wang committed
102
103
        x = F.relu(self.layer1(g, features))
        x = self.layer2(g, x)
104
        return x
105
106


107
108
109
110
111
112
net = Net()
print(net)

###############################################################################
# We load the cora dataset using DGL's built-in data module.

113
from dgl.data import CoraGraphDataset
114
115


116
def load_cora_data():
117
118
    dataset = CoraGraphDataset()
    g = dataset[0]
119
120
121
122
    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    test_mask = g.ndata["test_mask"]
Da Zheng's avatar
Da Zheng committed
123
124
    return g, features, labels, train_mask, test_mask

125

Da Zheng's avatar
Da Zheng committed
126
127
128
129
###############################################################################
# When a model is trained, we can use the following method to evaluate
# the performance of the model on the test dataset:

130

Da Zheng's avatar
Da Zheng committed
131
132
133
134
135
136
137
138
139
def evaluate(model, g, features, labels, mask):
    model.eval()
    with th.no_grad():
        logits = model(g, features)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = th.max(logits, dim=1)
        correct = th.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)
140

141

142
143
144
145
###############################################################################
# We then train the network as follows:

import time
146

147
import numpy as np
148

Da Zheng's avatar
Da Zheng committed
149
g, features, labels, train_mask, test_mask = load_cora_data()
Mufei Li's avatar
Mufei Li committed
150
151
# Add edges between each node and itself to preserve old node representations
g.add_edges(g.nodes(), g.nodes())
Minjie Wang's avatar
Minjie Wang committed
152
optimizer = th.optim.Adam(net.parameters(), lr=1e-2)
153
dur = []
Da Zheng's avatar
Da Zheng committed
154
for epoch in range(50):
155
    if epoch >= 3:
156
        t0 = time.time()
Da Zheng's avatar
Da Zheng committed
157
158

    net.train()
159
160
    logits = net(g, features)
    logp = F.log_softmax(logits, 1)
Da Zheng's avatar
Da Zheng committed
161
    loss = F.nll_loss(logp[train_mask], labels[train_mask])
162

163
164
165
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
166
167

    if epoch >= 3:
168
        dur.append(time.time() - t0)
169

Da Zheng's avatar
Da Zheng committed
170
    acc = evaluate(net, g, features, labels, test_mask)
171
172
173
174
175
    print(
        "Epoch {:05d} | Loss {:.4f} | Test Acc {:.4f} | Time(s) {:.4f}".format(
            epoch, loss.item(), acc, np.mean(dur)
        )
    )
176
177
178
179
180
181
182

###############################################################################
# .. _math:
#
# GCN in one formula
# ------------------
# Mathematically, the GCN model follows this formula:
183
#
184
# :math:`H^{(l+1)} = \sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(l)}W^{(l)})`
185
#
186
187
# Here, :math:`H^{(l)}` denotes the :math:`l^{th}` layer in the network,
# :math:`\sigma` is the non-linearity, and :math:`W` is the weight matrix for
Mufei Li's avatar
Mufei Li committed
188
189
190
191
# this layer. :math:`\tilde{D}` and :math:`\tilde{A}` are separately the degree
# and adjacency matrices for the graph. With the superscript ~, we are referring
# to the variant where we add additional edges between each node and itself to
# preserve its old representation in graph convolutions. The shape of the input
192
193
194
# :math:`H^{(0)}` is :math:`N \times D`, where :math:`N` is the number of nodes
# and :math:`D` is the number of input features. We can chain up multiple
# layers as such to produce a node-level representation output with shape
Mufei Li's avatar
Mufei Li committed
195
# :math:`N \times F`, where :math:`F` is the dimension of the output node
196
# feature vector.
Mufei Li's avatar
Mufei Li committed
197
#
198
199
200
# The equation can be efficiently implemented using sparse matrix
# multiplication kernels (such as Kipf's
# `pygcn <https://github.com/tkipf/pygcn>`_ code). The above DGL implementation
201
# in fact has already used this trick due to the use of builtin functions.
Mufei Li's avatar
Mufei Li committed
202
203
204
205
206
#
# Note that the tutorial code implements a simplified version of GCN where we
# replace :math:`\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}` with
# :math:`\tilde{A}`. For a full implementation, see our example
# `here  <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcn>`_.