1_gcn.py 7.63 KB
Newer Older
1
2
3
4
5
6
7
8
9
"""
.. _model-gcn:

Graph Convolutional Network
====================================

**Author:** `Qi Huang <https://github.com/HQ01>`_, `Minjie Wang  <https://jermainewang.github.io/>`_,
Yu Gai, Quan Gan, Zheng Zhang

10
11
12
13
14
15
16
.. warning::

    The tutorial aims at gaining insights into the paper, with code as a mean
    of explanation. The implementation thus is NOT optimized for running
    efficiency. For recommended implementation, please refer to the `official
    examples <https://github.com/dmlc/dgl/tree/master/examples>`_.

17
This is a gentle introduction of using DGL to implement Graph Convolutional
brett koonce's avatar
brett koonce committed
18
Networks (Kipf & Welling et al., `Semi-Supervised Classification with Graph
Minjie Wang's avatar
Minjie Wang committed
19
Convolutional Networks <https://arxiv.org/pdf/1609.02907.pdf>`_). We explain
20
what is under the hood of the :class:`~dgl.nn.GraphConv` module.
Minjie Wang's avatar
Minjie Wang committed
21
22
The reader is expected to learn how to define a new GNN layer using DGL's
message passing APIs.
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
"""

###############################################################################
# Model Overview
# ------------------------------------------
# GCN from the perspective of message passing
# ```````````````````````````````````````````````
# We describe a layer of graph convolutional neural network from a message
# passing perspective; the math can be found `here <math_>`_.
# It boils down to the following step, for each node :math:`u`:
# 
# 1) Aggregate neighbors' representations :math:`h_{v}` to produce an
# intermediate representation :math:`\hat{h}_u`.  2) Transform the aggregated
# representation :math:`\hat{h}_{u}` with a linear projection followed by a
# non-linearity: :math:`h_{u} = f(W_{u} \hat{h}_u)`.
# 
Minjie Wang's avatar
Minjie Wang committed
39
40
# We will implement step 1 with DGL message passing, and step 2 by
# PyTorch ``nn.Module``.
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 
# GCN implementation with DGL
# ``````````````````````````````````````````
# We first define the message and reduce function as usual.  Since the
# aggregation on a node :math:`u` only involves summing over the neighbors'
# representations :math:`h_v`, we can simply use builtin functions:

import dgl
import dgl.function as fn
import torch as th
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph

55
gcn_msg = fn.copy_u(u='h', out='m')
56
57
58
gcn_reduce = fn.sum(msg='m', out='h')

###############################################################################
Minjie Wang's avatar
Minjie Wang committed
59
60
# We then proceed to define the GCNLayer module. A GCNLayer essentially performs
# message passing on all the nodes then applies a fully-connected layer.
61
62
63
64
65
66
#
# .. note::
#
#    This is showing how to implement a GCN from scratch.  DGL provides a more
#    efficient :class:`builtin GCN layer module <dgl.nn.pytorch.conv.GraphConv>`.
#
67

Minjie Wang's avatar
Minjie Wang committed
68
69
70
class GCNLayer(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(GCNLayer, self).__init__()
71
72
73
        self.linear = nn.Linear(in_feats, out_feats)

    def forward(self, g, feature):
Minjie Wang's avatar
Minjie Wang committed
74
75
76
77
78
79
80
81
        # Creating a local scope so that all the stored ndata and edata
        # (such as the `'h'` ndata below) are automatically popped out
        # when the scope exits.
        with g.local_scope():
            g.ndata['h'] = feature
            g.update_all(gcn_msg, gcn_reduce)
            h = g.ndata['h']
            return self.linear(h)
82
83
84
85
86
87

###############################################################################
# The forward function is essentially the same as any other commonly seen NNs
# model in PyTorch.  We can initialize GCN like any ``nn.Module``. For example,
# let's define a simple neural network consisting of two GCN layers. Suppose we
# are training the classifier for the cora dataset (the input feature size is
Da Zheng's avatar
Da Zheng committed
88
# 1433 and the number of classes is 7). The last GCN layer computes node embeddings,
Minjie Wang's avatar
Minjie Wang committed
89
# so the last layer in general does not apply activation.
90
91
92
93

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
Minjie Wang's avatar
Minjie Wang committed
94
95
        self.layer1 = GCNLayer(1433, 16)
        self.layer2 = GCNLayer(16, 7)
96
97
    
    def forward(self, g, features):
Minjie Wang's avatar
Minjie Wang committed
98
99
        x = F.relu(self.layer1(g, features))
        x = self.layer2(g, x)
100
101
102
103
104
105
106
        return x
net = Net()
print(net)

###############################################################################
# We load the cora dataset using DGL's built-in data module.

107
from dgl.data import CoraGraphDataset
108
def load_cora_data():
109
110
111
112
113
114
    dataset = CoraGraphDataset()
    g = dataset[0]
    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    test_mask = g.ndata['test_mask']
Da Zheng's avatar
Da Zheng committed
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
    return g, features, labels, train_mask, test_mask

###############################################################################
# When a model is trained, we can use the following method to evaluate
# the performance of the model on the test dataset:

def evaluate(model, g, features, labels, mask):
    model.eval()
    with th.no_grad():
        logits = model(g, features)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = th.max(logits, dim=1)
        correct = th.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)
130
131
132
133
134
135

###############################################################################
# We then train the network as follows:

import time
import numpy as np
Da Zheng's avatar
Da Zheng committed
136
g, features, labels, train_mask, test_mask = load_cora_data()
Mufei Li's avatar
Mufei Li committed
137
138
# Add edges between each node and itself to preserve old node representations
g.add_edges(g.nodes(), g.nodes())
Minjie Wang's avatar
Minjie Wang committed
139
optimizer = th.optim.Adam(net.parameters(), lr=1e-2)
140
dur = []
Da Zheng's avatar
Da Zheng committed
141
for epoch in range(50):
142
143
    if epoch >=3:
        t0 = time.time()
Da Zheng's avatar
Da Zheng committed
144
145

    net.train()
146
147
    logits = net(g, features)
    logp = F.log_softmax(logits, 1)
Da Zheng's avatar
Da Zheng committed
148
    loss = F.nll_loss(logp[train_mask], labels[train_mask])
149
150
151
152
153
154
155
156
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if epoch >=3:
        dur.append(time.time() - t0)
    
Da Zheng's avatar
Da Zheng committed
157
158
159
    acc = evaluate(net, g, features, labels, test_mask)
    print("Epoch {:05d} | Loss {:.4f} | Test Acc {:.4f} | Time(s) {:.4f}".format(
            epoch, loss.item(), acc, np.mean(dur)))
160
161
162
163
164
165
166
167
168
169
170
171

###############################################################################
# .. _math:
#
# GCN in one formula
# ------------------
# Mathematically, the GCN model follows this formula:
# 
# :math:`H^{(l+1)} = \sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(l)}W^{(l)})`
# 
# Here, :math:`H^{(l)}` denotes the :math:`l^{th}` layer in the network,
# :math:`\sigma` is the non-linearity, and :math:`W` is the weight matrix for
Mufei Li's avatar
Mufei Li committed
172
173
174
175
# this layer. :math:`\tilde{D}` and :math:`\tilde{A}` are separately the degree
# and adjacency matrices for the graph. With the superscript ~, we are referring
# to the variant where we add additional edges between each node and itself to
# preserve its old representation in graph convolutions. The shape of the input
176
177
178
# :math:`H^{(0)}` is :math:`N \times D`, where :math:`N` is the number of nodes
# and :math:`D` is the number of input features. We can chain up multiple
# layers as such to produce a node-level representation output with shape
Mufei Li's avatar
Mufei Li committed
179
# :math:`N \times F`, where :math:`F` is the dimension of the output node
180
# feature vector.
Mufei Li's avatar
Mufei Li committed
181
#
182
183
184
# The equation can be efficiently implemented using sparse matrix
# multiplication kernels (such as Kipf's
# `pygcn <https://github.com/tkipf/pygcn>`_ code). The above DGL implementation
185
# in fact has already used this trick due to the use of builtin functions.
Mufei Li's avatar
Mufei Li committed
186
187
188
189
190
#
# Note that the tutorial code implements a simplified version of GCN where we
# replace :math:`\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}` with
# :math:`\tilde{A}`. For a full implementation, see our example
# `here  <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcn>`_.