Unverified Commit 28117cd9 authored by xiang song(charlie.song)'s avatar xiang song(charlie.song) Committed by GitHub
Browse files

[Example] GCMC with sampling (#1296)



* gcmc example

* Update Readme

* Add multiprocess support

* Fix

* Multigpu + dataloader

* Delete some dead code

* Delete more

* upd

* Add README

* upd

* Upd

* combine full batch and sample GCMCLayer, use HeteroCov

* Fix

* Update Readme

* udp

* Fix typo

* Add cpu run

* some fix and docstring
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-51-214.ec2.internal>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-63-71.ec2.internal>
parent bd1e48a5
...@@ -11,41 +11,217 @@ Credit: Jiani Zhang ([@jennyzhang0215](https://github.com/jennyzhang0215)) ...@@ -11,41 +11,217 @@ Credit: Jiani Zhang ([@jennyzhang0215](https://github.com/jennyzhang0215))
## Dependencies ## Dependencies
* PyTorch 1.2+ * PyTorch 1.2+
* pandas * pandas
* torchtext 0.4+ * torchtext 0.4+ (if using user and item contents as node features)
## Data ## Data
Supported datasets: ml-100k, ml-1m, ml-10m Supported datasets: ml-100k, ml-1m, ml-10m
## How to run ## How to run
### Train with full-graph
ml-100k, no feature ml-100k, no feature
```bash ```bash
python train.py --data_name=ml-100k --use_one_hot_fea --gcn_agg_accum=stack python3 train.py --data_name=ml-100k --use_one_hot_fea --gcn_agg_accum=stack
``` ```
Results: RMSE=0.9088 (0.910 reported) Results: RMSE=0.9088 (0.910 reported)
Speed: 0.0195s/epoch (vanilla implementation: 0.1008s/epoch) Speed: 0.0410s/epoch (vanilla implementation: 0.1008s/epoch)
ml-100k, with feature ml-100k, with feature
```bash ```bash
python train.py --data_name=ml-100k --gcn_agg_accum=stack python3 train.py --data_name=ml-100k --gcn_agg_accum=stack
``` ```
Results: RMSE=0.9448 (0.905 reported) Results: RMSE=0.9448 (0.905 reported)
ml-1m, no feature ml-1m, no feature
```bash ```bash
python train.py --data_name=ml-1m --gcn_agg_accum=sum --use_one_hot_fea python3 train.py --data_name=ml-1m --gcn_agg_accum=sum --use_one_hot_fea
``` ```
Results: RMSE=0.8377 (0.832 reported) Results: RMSE=0.8377 (0.832 reported)
Speed: 0.0557s/epoch (vanilla implementation: 1.538s/epoch) Speed: 0.0844s/epoch (vanilla implementation: 1.538s/epoch)
ml-10m, no feature ml-10m, no feature
```bash ```bash
python train.py --data_name=ml-10m --gcn_agg_accum=stack --gcn_dropout=0.3 \ python3 train.py --data_name=ml-10m --gcn_agg_accum=stack --gcn_dropout=0.3 \
--train_lr=0.001 --train_min_lr=0.0001 --train_max_iter=15000 \ --train_lr=0.001 --train_min_lr=0.0001 --train_max_iter=15000 \
--use_one_hot_fea --gen_r_num_basis_func=4 --use_one_hot_fea --gen_r_num_basis_func=4
``` ```
Results: RMSE=0.7800 (0.777 reported) Results: RMSE=0.7800 (0.777 reported)
Speed: 0.9207/epoch (vanilla implementation: OOM) Speed: 1.1982/epoch (vanilla implementation: OOM)
Testbed: EC2 p3.2xlarge instance(Amazon Linux 2) Testbed: EC2 p3.2xlarge instance(Amazon Linux 2)
### Train with minibatch on a single GPU
ml-100k, no feature
```bash
python3 train_sampling.py --data_name=ml-100k \
--use_one_hot_fea \
--gcn_agg_accum=stack \
--gpu 0
```
ml-100k, no feature with mix_cpu_gpu run, for mix_cpu_gpu run with no feature, the W_r is stored in CPU by default other than in GPU.
```bash
python3 train_sampling.py --data_name=ml-100k \
--use_one_hot_fea \
--gcn_agg_accum=stack \
--mix_cpu_gpu \
--gpu 0
```
Results: RMSE=0.9380
Speed: 1.059s/epoch (Run with 70 epoches)
Speed: 1.046s/epoch (mix_cpu_gpu)
ml-100k, with feature
```bash
python3 train_sampling.py --data_name=ml-100k \
--gcn_agg_accum=stack \
--train_max_epoch 90 \
--gpu 0
```
Results: RMSE=0.9574
ml-1m, no feature
```bash
python3 train_sampling.py --data_name=ml-1m \
--gcn_agg_accum=sum \
--use_one_hot_fea \
--train_max_epoch 160 \
--gpu 0
```
ml-1m, no feature with mix_cpu_gpu run
```bash
python3 train_sampling.py --data_name=ml-1m \
--gcn_agg_accum=sum \
--use_one_hot_fea \
--train_max_epoch 60 \
--mix_cpu_gpu \
--gpu 0
```
Results: RMSE=0.8632
Speed: 7.852s/epoch (Run with 60 epoches)
Speed: 7.788s/epoch (mix_cpu_gpu)
ml-10m, no feature
```bash
python3 train_sampling.py --data_name=ml-10m \
--gcn_agg_accum=stack \
--gcn_dropout=0.3 \
--train_lr=0.001 \
--train_min_lr=0.0001 \
--train_max_epoch=60 \
--use_one_hot_fea \
--gen_r_num_basis_func=4 \
--gpu 0
```
ml-10m, no feature with mix_cpu_gpu run
```bash
python3 train_sampling.py --data_name=ml-10m \
--gcn_agg_accum=stack \
--gcn_dropout=0.3 \
--train_lr=0.001 \
--train_min_lr=0.0001 \
--train_max_epoch=60 \
--use_one_hot_fea \
--gen_r_num_basis_func=4 \
--mix_cpu_gpu \
--gpu 0
```
Results: RMSE=0.8050
Speed: 394.304s/epoch (Run with 60 epoches)
Speed: 408.749s/epoch (mix_cpu_gpu)
Testbed: EC2 p3.2xlarge instance
### Train with minibatch on multi-GPU
ml-100k, no feature
```bash
python train_sampling.py --data_name=ml-100k \
--gcn_agg_accum=stack \
--train_max_epoch 30 \
--train_lr 0.02 \
--use_one_hot_fea \
--gpu 0,1,2,3,4,5,6,7
```
ml-100k, no feature with mix_cpu_gpu run
```bash
python train_sampling.py --data_name=ml-100k \
--gcn_agg_accum=stack \
--train_max_epoch 30 \
--train_lr 0.02 \
--use_one_hot_fea \
--mix_cpu_gpu \
--gpu 0,1,2,3,4,5,6,7
```
Result: RMSE=0.9397
Speed: 1.202s/epoch (Run with only 30 epoches)
Speed: 1.245/epoch (mix_cpu_gpu)
ml-100k, with feature
```bash
python train_sampling.py --data_name=ml-100k \
--gcn_agg_accum=stack \
--train_max_epoch 30 \
--gpu 0,1,2,3,4,5,6,7
```
Result: RMSE=0.9655
Speed: 1.265/epoch (Run with 30 epoches)
ml-1m, no feature
```bash
python train_sampling.py --data_name=ml-1m \
--gcn_agg_accum=sum \
--train_max_epoch 40 \
--use_one_hot_fea \
--gpu 0,1,2,3,4,5,6,7
```
ml-1m, no feature with mix_cpu_gpu run
```bash
python train_sampling.py --data_name=ml-1m \
--gcn_agg_accum=sum \
--train_max_epoch 40 \
--use_one_hot_fea \
--mix_cpu_gpu \
--gpu 0,1,2,3,4,5,6,7
```
Results: RMSE=0.8621
Speed: 11.612s/epoch (Run with 40 epoches)
Speed: 12.483s/epoch (mix_cpu_gpu)
ml-10m, no feature
```bash
python train_sampling.py --data_name=ml-10m \
--gcn_agg_accum=stack \
--gcn_dropout=0.3 \
--train_lr=0.001 \
--train_min_lr=0.0001 \
--train_max_epoch=30 \
--use_one_hot_fea \
--gen_r_num_basis_func=4 \
--gpu 0,1,2,3,4,5,6,7
```
ml-10m, no feature with mix_cpu_gpu run
```bash
python train_sampling.py --data_name=ml-10m \
--gcn_agg_accum=stack \
--gcn_dropout=0.3 \
--train_lr=0.001 \
--train_min_lr=0.0001 \
--train_max_epoch=30 \
--use_one_hot_fea \
--gen_r_num_basis_func=4 \
--mix_cpu_gpu \
--gpu 0,1,2,3,4,5,6,7
```
Results: RMSE=0.8084
Speed: 632.868s/epoch (Run with 30 epoches)
Speed: 633.397s/epoch (mix_cpu_gpu)
Testbed: EC2 p3.16xlarge instance
### Train with minibatch on CPU
ml-100k, no feature
```bash
python3 train_sampling.py --data_name=ml-100k \
--use_one_hot_fea \
--gcn_agg_accum=stack \
--gpu -1
```
Speed 1.591s/epoch
Testbed: EC2 r5.xlarge instance
...@@ -5,8 +5,6 @@ import re ...@@ -5,8 +5,6 @@ import re
import pandas as pd import pandas as pd
import scipy.sparse as sp import scipy.sparse as sp
import torch as th import torch as th
from torchtext import data
from torchtext.vocab import GloVe
import dgl import dgl
from dgl.data.utils import download, extract_archive, get_download_dir from dgl.data.utils import download, extract_archive, get_download_dir
...@@ -84,6 +82,8 @@ class MovieLens(object): ...@@ -84,6 +82,8 @@ class MovieLens(object):
Dataset name. Could be "ml-100k", "ml-1m", "ml-10m" Dataset name. Could be "ml-100k", "ml-1m", "ml-10m"
device : torch.device device : torch.device
Device context Device context
mix_cpu_gpu : boo, optional
If true, the ``user_feature`` attribute is stored in CPU
use_one_hot_fea : bool, optional use_one_hot_fea : bool, optional
If true, the ``user_feature`` attribute is None, representing an one-hot identity If true, the ``user_feature`` attribute is None, representing an one-hot identity
matrix. (Default: False) matrix. (Default: False)
...@@ -96,7 +96,8 @@ class MovieLens(object): ...@@ -96,7 +96,8 @@ class MovieLens(object):
Ratio of validation data Ratio of validation data
""" """
def __init__(self, name, device, use_one_hot_fea=False, symm=True, def __init__(self, name, device, mix_cpu_gpu=False,
use_one_hot_fea=False, symm=True,
test_ratio=0.1, valid_ratio=0.1): test_ratio=0.1, valid_ratio=0.1):
self._name = name self._name = name
self._device = device self._device = device
...@@ -164,8 +165,13 @@ class MovieLens(object): ...@@ -164,8 +165,13 @@ class MovieLens(object):
self.user_feature = None self.user_feature = None
self.movie_feature = None self.movie_feature = None
else: else:
self.user_feature = th.FloatTensor(self._process_user_fea()).to(device) # if mix_cpu_gpu, we put features in CPU
self.movie_feature = th.FloatTensor(self._process_movie_fea()).to(device) if mix_cpu_gpu:
self.user_feature = th.FloatTensor(self._process_user_fea())
self.movie_feature = th.FloatTensor(self._process_movie_fea())
else:
self.user_feature = th.FloatTensor(self._process_user_fea()).to(self._device)
self.movie_feature = th.FloatTensor(self._process_movie_fea()).to(self._device)
if self.user_feature is None: if self.user_feature is None:
self.user_feature_shape = (self.num_user, self.num_user) self.user_feature_shape = (self.num_user, self.num_user)
self.movie_feature_shape = (self.num_movie, self.num_movie) self.movie_feature_shape = (self.num_movie, self.num_movie)
...@@ -204,6 +210,7 @@ class MovieLens(object): ...@@ -204,6 +210,7 @@ class MovieLens(object):
def _npairs(graph): def _npairs(graph):
rst = 0 rst = 0
for r in self.possible_rating_values: for r in self.possible_rating_values:
r = str(r).replace('.', '_')
rst += graph.number_of_edges(str(r)) rst += graph.number_of_edges(str(r))
return rst return rst
...@@ -245,9 +252,10 @@ class MovieLens(object): ...@@ -245,9 +252,10 @@ class MovieLens(object):
ridx = np.where(rating_values == rating) ridx = np.where(rating_values == rating)
rrow = rating_row[ridx] rrow = rating_row[ridx]
rcol = rating_col[ridx] rcol = rating_col[ridx]
bg = dgl.bipartite((rrow, rcol), 'user', str(rating), 'movie', rating = str(rating).replace('.', '_')
bg = dgl.bipartite((rrow, rcol), 'user', rating, 'movie',
num_nodes=(self._num_user, self._num_movie)) num_nodes=(self._num_user, self._num_movie))
rev_bg = dgl.bipartite((rcol, rrow), 'movie', 'rev-%s' % str(rating), 'user', rev_bg = dgl.bipartite((rcol, rrow), 'movie', 'rev-%s' % rating, 'user',
num_nodes=(self._num_movie, self._num_user)) num_nodes=(self._num_movie, self._num_user))
rating_graphs.append(bg) rating_graphs.append(bg)
rating_graphs.append(rev_bg) rating_graphs.append(rev_bg)
...@@ -267,7 +275,7 @@ class MovieLens(object): ...@@ -267,7 +275,7 @@ class MovieLens(object):
movie_ci = [] movie_ci = []
movie_cj = [] movie_cj = []
for r in self.possible_rating_values: for r in self.possible_rating_values:
r = str(r) r = str(r).replace('.', '_')
user_ci.append(graph['rev-%s' % r].in_degrees()) user_ci.append(graph['rev-%s' % r].in_degrees())
movie_ci.append(graph[r].in_degrees()) movie_ci.append(graph[r].in_degrees())
if self._symm: if self._symm:
...@@ -494,6 +502,8 @@ class MovieLens(object): ...@@ -494,6 +502,8 @@ class MovieLens(object):
Generate movie features by concatenating embedding and the year Generate movie features by concatenating embedding and the year
""" """
import torchtext
if self._name == 'ml-100k': if self._name == 'ml-100k':
GENRES = GENRES_ML_100K GENRES = GENRES_ML_100K
elif self._name == 'ml-1m': elif self._name == 'ml-1m':
...@@ -503,8 +513,8 @@ class MovieLens(object): ...@@ -503,8 +513,8 @@ class MovieLens(object):
else: else:
raise NotImplementedError raise NotImplementedError
TEXT = data.Field(tokenize='spacy') TEXT = torchtext.data.Field(tokenize='spacy')
embedding = GloVe(name='840B', dim=300) embedding = torchtext.vocab.GloVe(name='840B', dim=300)
title_embedding = np.zeros(shape=(self.movie_info.shape[0], 300), dtype=np.float32) title_embedding = np.zeros(shape=(self.movie_info.shape[0], 300), dtype=np.float32)
release_years = np.zeros(shape=(self.movie_info.shape[0], 1), dtype=np.float32) release_years = np.zeros(shape=(self.movie_info.shape[0], 1), dtype=np.float32)
......
"""NN modules""" """NN modules"""
import torch as th import torch as th
import torch.nn as nn import torch.nn as nn
from torch.nn import init
import dgl.function as fn import dgl.function as fn
import dgl.nn.pytorch as dglnn
from utils import get_activation from utils import get_activation
class GCMCGraphConv(nn.Module):
"""Graph convolution module used in the GCMC model.
Parameters
----------
in_feats : int
Input feature size.
out_feats : int
Output feature size.
weight : bool, optional
If True, apply a linear layer. Otherwise, aggregating the messages
without a weight matrix or with an shared weight provided by caller.
device: str, optional
Which device to put data in. Useful in mix_cpu_gpu training and
multi-gpu training
"""
def __init__(self,
in_feats,
out_feats,
weight=True,
device=None,
dropout_rate=0.0):
super(GCMCGraphConv, self).__init__()
self._in_feats = in_feats
self._out_feats = out_feats
self.device = device
self.dropout = nn.Dropout(dropout_rate)
if weight:
self.weight = nn.Parameter(th.Tensor(in_feats, out_feats))
else:
self.register_parameter('weight', None)
self.reset_parameters()
def reset_parameters(self):
"""Reinitialize learnable parameters."""
if self.weight is not None:
init.xavier_uniform_(self.weight)
def forward(self, graph, feat, weight=None):
"""Compute graph convolution.
Normalizer constant :math:`c_{ij}` is stored as two node data "ci"
and "cj".
Parameters
----------
graph : DGLGraph
The graph.
feat : torch.Tensor
The input feature
weight : torch.Tensor, optional
Optional external weight tensor.
dropout : torch.nn.Dropout, optional
Optional external dropout layer.
Returns
-------
torch.Tensor
The output feature
"""
with graph.local_scope():
cj = graph.srcdata['cj']
ci = graph.dstdata['ci']
if self.device is not None:
cj = cj.to(self.device)
ci = ci.to(self.device)
if weight is not None:
if self.weight is not None:
raise DGLError('External weight is provided while at the same time the'
' module has defined its own weight parameter. Please'
' create the module with flag weight=False.')
else:
weight = self.weight
if weight is not None:
feat = dot_or_identity(feat, weight, self.device)
feat = feat * self.dropout(cj)
graph.srcdata['h'] = feat
graph.update_all(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'))
rst = graph.dstdata['h']
rst = rst * ci
return rst
class GCMCLayer(nn.Module): class GCMCLayer(nn.Module):
r"""GCMC layer r"""GCMC layer
...@@ -49,6 +138,9 @@ class GCMCLayer(nn.Module): ...@@ -49,6 +138,9 @@ class GCMCLayer(nn.Module):
If true, user node and movie node share the same set of parameters. If true, user node and movie node share the same set of parameters.
Require ``user_in_units`` and ``move_in_units`` to be the same. Require ``user_in_units`` and ``move_in_units`` to be the same.
(Default: False) (Default: False)
device: str, optional
Which device to put data in. Useful in mix_cpu_gpu training and
multi-gpu training
""" """
def __init__(self, def __init__(self,
rating_vals, rating_vals,
...@@ -60,7 +152,8 @@ class GCMCLayer(nn.Module): ...@@ -60,7 +152,8 @@ class GCMCLayer(nn.Module):
agg='stack', # or 'sum' agg='stack', # or 'sum'
agg_act=None, agg_act=None,
out_act=None, out_act=None,
share_user_item_param=False): share_user_item_param=False,
device=None):
super(GCMCLayer, self).__init__() super(GCMCLayer, self).__init__()
self.rating_vals = rating_vals self.rating_vals = rating_vals
self.agg = agg self.agg = agg
...@@ -77,19 +170,57 @@ class GCMCLayer(nn.Module): ...@@ -77,19 +170,57 @@ class GCMCLayer(nn.Module):
msg_units = msg_units // len(rating_vals) msg_units = msg_units // len(rating_vals)
self.dropout = nn.Dropout(dropout_rate) self.dropout = nn.Dropout(dropout_rate)
self.W_r = nn.ParameterDict() self.W_r = nn.ParameterDict()
subConv = {}
for rating in rating_vals: for rating in rating_vals:
# PyTorch parameter name can't contain "." # PyTorch parameter name can't contain "."
rating = str(rating).replace('.', '_') rating = str(rating).replace('.', '_')
rev_rating = 'rev-%s' % rating
if share_user_item_param and user_in_units == movie_in_units: if share_user_item_param and user_in_units == movie_in_units:
self.W_r[rating] = nn.Parameter(th.randn(user_in_units, msg_units)) self.W_r[rating] = nn.Parameter(th.randn(user_in_units, msg_units))
self.W_r['rev-%s' % rating] = self.W_r[rating] self.W_r['rev-%s' % rating] = self.W_r[rating]
subConv[rating] = GCMCGraphConv(user_in_units,
msg_units,
weight=False,
device=device,
dropout_rate=dropout_rate)
subConv[rev_rating] = GCMCGraphConv(user_in_units,
msg_units,
weight=False,
device=device,
dropout_rate=dropout_rate)
else: else:
self.W_r[rating] = nn.Parameter(th.randn(user_in_units, msg_units)) self.W_r = None
self.W_r['rev-%s' % rating] = nn.Parameter(th.randn(movie_in_units, msg_units)) subConv[rating] = GCMCGraphConv(user_in_units,
msg_units,
weight=True,
device=device,
dropout_rate=dropout_rate)
subConv[rev_rating] = GCMCGraphConv(movie_in_units,
msg_units,
weight=True,
device=device,
dropout_rate=dropout_rate)
self.conv = dglnn.HeteroGraphConv(subConv, aggregate=agg)
self.agg_act = get_activation(agg_act) self.agg_act = get_activation(agg_act)
self.out_act = get_activation(out_act) self.out_act = get_activation(out_act)
self.device = device
self.reset_parameters() self.reset_parameters()
def partial_to(self, device):
"""Put parameters into device except W_r
Parameters
----------
device : torch device
Which device the parameters are put in.
"""
assert device == self.device
if device is not None:
self.ufc.cuda(device)
if self.share_user_item_param is False:
self.ifc.cuda(device)
self.dropout.cuda(device)
def reset_parameters(self): def reset_parameters(self):
for p in self.parameters(): for p in self.parameters():
if p.dim() > 1: if p.dim() > 1:
...@@ -98,9 +229,6 @@ class GCMCLayer(nn.Module): ...@@ -98,9 +229,6 @@ class GCMCLayer(nn.Module):
def forward(self, graph, ufeat=None, ifeat=None): def forward(self, graph, ufeat=None, ifeat=None):
"""Forward function """Forward function
Normalizer constant :math:`c_{ij}` is stored as two node data "ci"
and "cj".
Parameters Parameters
---------- ----------
graph : DGLHeteroGraph graph : DGLHeteroGraph
...@@ -118,28 +246,19 @@ class GCMCLayer(nn.Module): ...@@ -118,28 +246,19 @@ class GCMCLayer(nn.Module):
new_ifeat : torch.Tensor new_ifeat : torch.Tensor
New movie features New movie features
""" """
num_u = graph.number_of_nodes('user') in_feats = {'user' : ufeat, 'movie' : ifeat}
num_i = graph.number_of_nodes('movie') mod_args = {}
funcs = {}
for i, rating in enumerate(self.rating_vals): for i, rating in enumerate(self.rating_vals):
rating = str(rating) rating = str(rating).replace('.', '_')
# W_r * x rev_rating = 'rev-%s' % rating
x_u = dot_or_identity(ufeat, self.W_r[rating.replace('.', '_')]) mod_args[rating] = (self.W_r[rating] if self.W_r is not None else None,)
x_i = dot_or_identity(ifeat, self.W_r['rev-%s' % rating.replace('.', '_')]) mod_args[rev_rating] = (self.W_r[rev_rating] if self.W_r is not None else None,)
# left norm and dropout out_feats = self.conv(graph, in_feats, mod_args=mod_args)
x_u = x_u * self.dropout(graph.nodes['user'].data['cj']) ufeat = out_feats['user']
x_i = x_i * self.dropout(graph.nodes['movie'].data['cj']) ifeat = out_feats['movie']
graph.nodes['user'].data['h%d' % i] = x_u ufeat = ufeat.view(ufeat.shape[0], -1)
graph.nodes['movie'].data['h%d' % i] = x_i ifeat = ifeat.view(ifeat.shape[0], -1)
funcs[rating] = (fn.copy_u('h%d' % i, 'm'), fn.sum('m', 'h'))
funcs['rev-%s' % rating] = (fn.copy_u('h%d' % i, 'm'), fn.sum('m', 'h'))
# message passing
graph.multi_update_all(funcs, self.agg)
ufeat = graph.nodes['user'].data.pop('h').view(num_u, -1)
ifeat = graph.nodes['movie'].data.pop('h').view(num_i, -1)
# right norm
ufeat = ufeat * graph.nodes['user'].data['ci']
ifeat = ifeat * graph.nodes['movie'].data['ci']
# fc and non-linear # fc and non-linear
ufeat = self.agg_act(ufeat) ufeat = self.agg_act(ufeat)
ifeat = self.agg_act(ifeat) ifeat = self.agg_act(ifeat)
...@@ -150,7 +269,10 @@ class GCMCLayer(nn.Module): ...@@ -150,7 +269,10 @@ class GCMCLayer(nn.Module):
return self.out_act(ufeat), self.out_act(ifeat) return self.out_act(ufeat), self.out_act(ifeat)
class BiDecoder(nn.Module): class BiDecoder(nn.Module):
r"""Bilinear decoder. r"""Bi-linear decoder.
Given a bipartite graph G, for each edge (i, j) ~ G, compute the likelihood
of it being class r by:
.. math:: .. math::
p(M_{ij}=r) = \text{softmax}(u_i^TQ_rv_j) p(M_{ij}=r) = \text{softmax}(u_i^TQ_rv_j)
...@@ -163,28 +285,27 @@ class BiDecoder(nn.Module): ...@@ -163,28 +285,27 @@ class BiDecoder(nn.Module):
Parameters Parameters
---------- ----------
rating_vals : list of int or float
Possible rating values.
in_units : int in_units : int
Size of input user and movie features Size of input user and movie features
num_basis_functions : int, optional num_classes : int
Number of classes.
num_basis : int, optional
Number of basis. (Default: 2) Number of basis. (Default: 2)
dropout_rate : float, optional dropout_rate : float, optional
Dropout raite (Default: 0.0) Dropout raite (Default: 0.0)
""" """
def __init__(self, def __init__(self,
rating_vals,
in_units, in_units,
num_basis_functions=2, num_classes,
num_basis=2,
dropout_rate=0.0): dropout_rate=0.0):
super(BiDecoder, self).__init__() super(BiDecoder, self).__init__()
self.rating_vals = rating_vals self._num_basis = num_basis
self._num_basis_functions = num_basis_functions
self.dropout = nn.Dropout(dropout_rate) self.dropout = nn.Dropout(dropout_rate)
self.Ps = nn.ParameterList() self.Ps = nn.ParameterList()
for i in range(num_basis_functions): for i in range(num_basis):
self.Ps.append(nn.Parameter(th.randn(in_units, in_units))) self.Ps.append(nn.Parameter(th.randn(in_units, in_units)))
self.rate_out = nn.Linear(self._num_basis_functions, len(rating_vals), bias=False) self.combine_basis = nn.Linear(self._num_basis, num_classes, bias=False)
self.reset_parameters() self.reset_parameters()
def reset_parameters(self): def reset_parameters(self):
...@@ -209,22 +330,83 @@ class BiDecoder(nn.Module): ...@@ -209,22 +330,83 @@ class BiDecoder(nn.Module):
th.Tensor th.Tensor
Predicting scores for each user-movie edge. Predicting scores for each user-movie edge.
""" """
graph = graph.local_var() with graph.local_scope():
ufeat = self.dropout(ufeat) ufeat = self.dropout(ufeat)
ifeat = self.dropout(ifeat) ifeat = self.dropout(ifeat)
graph.nodes['movie'].data['h'] = ifeat graph.nodes['movie'].data['h'] = ifeat
basis_out = [] basis_out = []
for i in range(self._num_basis_functions): for i in range(self._num_basis):
graph.nodes['user'].data['h'] = ufeat @ self.Ps[i] graph.nodes['user'].data['h'] = ufeat @ self.Ps[i]
graph.apply_edges(fn.u_dot_v('h', 'h', 'sr')) graph.apply_edges(fn.u_dot_v('h', 'h', 'sr'))
basis_out.append(graph.edata['sr'].unsqueeze(1)) basis_out.append(graph.edata['sr'].unsqueeze(1))
out = th.cat(basis_out, dim=1) out = th.cat(basis_out, dim=1)
out = self.rate_out(out) out = self.combine_basis(out)
return out
class DenseBiDecoder(BiDecoder):
r"""Dense bi-linear decoder.
Dense implementation of the bi-linear decoder used in GCMC. Suitable when
the graph can be efficiently represented by a pair of arrays (one for source
nodes; one for destination nodes).
Parameters
----------
in_units : int
Size of input user and movie features
num_classes : int
Number of classes.
num_basis : int, optional
Number of basis. (Default: 2)
dropout_rate : float, optional
Dropout raite (Default: 0.0)
"""
def __init__(self,
in_units,
num_classes,
num_basis=2,
dropout_rate=0.0):
super(DenseBiDecoder, self).__init__(in_units,
num_classes,
num_basis,
dropout_rate)
def forward(self, ufeat, ifeat):
"""Forward function.
Compute logits for each pair ``(ufeat[i], ifeat[i])``.
Parameters
----------
ufeat : th.Tensor
User embeddings. Shape: (B, D)
ifeat : th.Tensor
Movie embeddings. Shape: (B, D)
Returns
-------
th.Tensor
Predicting scores for each user-movie edge. Shape: (B, num_classes)
"""
ufeat = self.dropout(ufeat)
ifeat = self.dropout(ifeat)
basis_out = []
for i in range(self._num_basis):
ufeat_i = ufeat @ self.Ps[i]
out = th.einsum('ab,ab->a', ufeat_i, ifeat)
basis_out.append(out.unsqueeze(1))
out = th.cat(basis_out, dim=1)
out = self.combine_basis(out)
return out return out
def dot_or_identity(A, B): def dot_or_identity(A, B, device=None):
# if A is None, treat as identity matrix # if A is None, treat as identity matrix
if A is None: if A is None:
return B return B
elif len(A.shape) == 1:
if device is None:
return B[A]
else:
return B[A].to(device)
else: else:
return A @ B return A @ B
"""Training script""" """Training GCMC model on the MovieLens data set.
The script loads the full graph to the training device.
"""
import os, time import os, time
import argparse import argparse
import logging import logging
...@@ -8,7 +11,7 @@ import numpy as np ...@@ -8,7 +11,7 @@ import numpy as np
import torch as th import torch as th
import torch.nn as nn import torch.nn as nn
from data import MovieLens from data import MovieLens
from model import GCMCLayer, BiDecoder from model import BiDecoder, GCMCLayer
from utils import get_activation, get_optimizer, torch_total_param_num, torch_net_info, MetricLogger from utils import get_activation, get_optimizer, torch_total_param_num, torch_net_info, MetricLogger
class Net(nn.Module): class Net(nn.Module):
...@@ -23,10 +26,11 @@ class Net(nn.Module): ...@@ -23,10 +26,11 @@ class Net(nn.Module):
args.gcn_dropout, args.gcn_dropout,
args.gcn_agg_accum, args.gcn_agg_accum,
agg_act=self._act, agg_act=self._act,
share_user_item_param=args.share_param) share_user_item_param=args.share_param,
self.decoder = BiDecoder(args.rating_vals, device=args.device)
in_units=args.gcn_out_units, self.decoder = BiDecoder(in_units=args.gcn_out_units,
num_basis_functions=args.gen_r_num_basis_func) num_classes=len(args.rating_vals),
num_basis=args.gen_r_num_basis_func)
def forward(self, enc_graph, dec_graph, ufeat, ifeat): def forward(self, enc_graph, dec_graph, ufeat, ifeat):
user_out, movie_out = self.encoder( user_out, movie_out = self.encoder(
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment