Unverified Commit 539335ce authored by Jinjing Zhou's avatar Jinjing Zhou Committed by GitHub
Browse files

DGL Enter (#3690)



* add

* fix

* fix

* fix

* fix

* add

* add

* fix

* fix

* fix

* new loader

* fix

* fix

* fix for 3.6

* fix

* add

* add receipes and also some bug fixes

* fix

* fix

* fix

* fix receipies

* allow AsNodeDataset to work on ogb

* add ut

* many fixes for nodepred-ns pipeline

* receipe for nodepred-ns

* Update enter/README.md
Co-authored-by: default avatarZihao Ye <zihaoye.cs@gmail.com>

* fix layers

* fix

* fix

* fix

* fix

* fix multiple issues

* fix for citation2

* fix comment

* fix

* fix

* clean up

* fix
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
Co-authored-by: default avatarZihao Ye <zihaoye.cs@gmail.com>
parent 80fb4dbe
# DGL-Enter
(What is DGL-Enter? Why design this? What is it for?)
DGL-Enter is a commanline tool for user to quickly bootstrap models with multiple datasets. And provide full capability for user to customize the pipeline into their own takks.
## Installation guide
You can install DGL-enter easily by `pip install dglenter`. Then you should be able to use DGL-Enter in you commandline tool by type in `dgl-enter`
```
Usage: dgl-enter [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
config Generate the config files
export Export the python file from config
train Train the model
```
## Train GraphSAGE on Cora from scratch
Here we'll use one of the most classic model GraphSAGE and Cora citation graph dataset as an example, to show how easy to train a model with DGL-Enter.
### Step 1: Use `dgl-enter config` to generate a yaml configuration file
Run `dgl-enter config nodepred --data cora --model sage --cfg cora_sage.yml`. Then you'll get a configuration file `cora_sage.yml` includes all the configuration to be tuned, with the comments
Optionally, You can change the config as you want to acheive a better performance. Below is a modified sample based on the template generated by the command above.
The early stop part is removed for simplicity
```yaml
version: 0.0.1
pipeline_name: nodepred
device: cpu
data:
name: cora
split_ratio: # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
model:
name: sage
embed_size: -1 # The dimension of created embedding table. -1 means using original node embedding
hidden_size: 16 # Hidden size.
num_layers: 1 # Number of hidden layers.
activation: relu # Activation function name under torch.nn.functional
dropout: 0.5 # Dropout rate.
aggregator_type: gcn # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
general_pipeline:
num_epochs: 200 # Number of training epochs
eval_period: 5 # Interval epochs between evaluations
optimizer:
name: Adam
lr: 0.01
weight_decay: 0.0005
loss: CrossEntropyLoss
num_runs: 1 # Number of experiments to run
```
### Step 2: Use `dgl-enter train` to initiate the training process.
Simply run `dgl-enter train --cfg cora_sage.yml` will start the training process
```log
...
Epoch 00190 | Loss 1.5225 | TrainAcc 0.9500 | ValAcc 0.6840
Epoch 00191 | Loss 1.5416 | TrainAcc 0.9357 | ValAcc 0.6840
Epoch 00192 | Loss 1.5391 | TrainAcc 0.9357 | ValAcc 0.6840
Epoch 00193 | Loss 1.5257 | TrainAcc 0.9643 | ValAcc 0.6840
Epoch 00194 | Loss 1.5196 | TrainAcc 0.9286 | ValAcc 0.6840
EarlyStopping counter: 12 out of 20
Epoch 00195 | Loss 1.4862 | TrainAcc 0.9643 | ValAcc 0.6760
Epoch 00196 | Loss 1.5142 | TrainAcc 0.9714 | ValAcc 0.6760
Epoch 00197 | Loss 1.5145 | TrainAcc 0.9714 | ValAcc 0.6760
Epoch 00198 | Loss 1.5174 | TrainAcc 0.9571 | ValAcc 0.6760
Epoch 00199 | Loss 1.5235 | TrainAcc 0.9714 | ValAcc 0.6760
Test Accuracy 0.7740
Accuracy across 1 runs: 0.774 ± 0.0
```
That's all! Basically you only need two line of command to train a graph neural network.
## Debug your model and advanced customization
That's not everything yet. We belive you may want to change more than the configuration files, to change the training pipeline, calculate new metrics, or look into the code for details.
DGL-Enter can export a self-contained, runnable python script for you to do anything you like.
Try `dgl-enter export --cfg cora_sage.yml --output script.py`, and you'll get the script used to train the model, like a magic!
Below
```python
...
def train(cfg, pipeline_cfg, device, data, model, optimizer, loss_fcn):
g = data[0] # Only train on the first graph
g = dgl.remove_self_loop(g)
g = dgl.add_self_loop(g)
g = g.to(device)
node_feat = g.ndata.get('feat', None)
edge_feat = g.edata.get('feat', None)
label = g.ndata['label']
train_mask, val_mask, test_mask = g.ndata['train_mask'].bool(
), g.ndata['val_mask'].bool(), g.ndata['test_mask'].bool()
val_acc = 0.
for epoch in range(pipeline_cfg['num_epochs']):
model.train()
logits = model(g, node_feat, edge_feat)
loss = loss_fcn(logits[train_mask], label[train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_acc = accuracy(logits[train_mask], label[train_mask])
if epoch != 0 and epoch % pipeline_cfg['eval_period'] == 0:
val_acc = accuracy(logits[val_mask], label[val_mask])
print("Epoch {:05d} | Loss {:.4f} | TrainAcc {:.4f} | ValAcc {:.4f}".
format(epoch, loss.item(), train_acc, val_acc))
model.eval()
with torch.no_grad():
logits = model(g, node_feat, edge_feat)
test_acc = accuracy(logits[test_mask], label[test_mask])
return test_acc
def main():
cfg = {
'version': '0.0.1',
'device': 'cpu',
'data': {
'split_ratio': None},
'model': {
'embed_size': -1,
'hidden_size': 16,
'num_layers': 1,
'activation': 'relu',
'dropout': 0.5,
'aggregator_type': 'gcn'},
'general_pipeline': {
'num_epochs': 200,
'eval_period': 5,
'optimizer': {
'lr': 0.01,
'weight_decay': 0.0005},
'loss': 'CrossEntropyLoss',
'num_runs': 1}}
device = cfg['device']
pipeline_cfg = cfg['general_pipeline']
# load data
data = AsNodePredDataset(CoraGraphDataset())
# create model
model_cfg = cfg["model"]
cfg["model"]["data_info"] = {
"in_size": model_cfg['embed_size'] if model_cfg['embed_size'] > 0 else data[0].ndata['feat'].shape[1],
"out_size": data.num_classes,
"num_nodes": data[0].num_nodes()
}
model = GraphSAGE(**cfg["model"])
model = model.to(device)
loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(
model.parameters(),
**pipeline_cfg["optimizer"])
# train
test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
return test_acc
...
```
## Recipes
We've prepared a set of finetuned config under `enter/recipes`, that you can try easily to get a reproducable result.
For example, using GCN with pubmet dataset, you can use `enter/recipes/nodepred_pubmed_gcn.yml`.
To try it, type in `dgl-enter train --cfg recipes/nodepred_pubmed_gcn.yml` to train the model, or `dgl-enter export --cfg recipes/nodepred_pubmed_gcn.yml` to get the full training script.
## Use DGL-Enter on your own dataset
You can modify the generated script in anyway you want. However, we also provided an end2end way to use your own dataset, by using our `CSVDataset`.
Step 1: Prepare your csv and metadata file.
Following the tutorial at [Loading data from CSV files](https://docs.dgl.ai/en/latest/guide/data-loadcsv.html#guide-data-pipeline-loadcsv`), Prepare your own CSV dataset includes three files minimally, node data csv, edge data csv and the meta data file (meta.yml).
```yml
dataset_name: my_csv_dataset
edge_data:
- file_name: edges.csv
node_data:
- file_name: nodes.csv
```
Step 2: Choose to csv dataset in the `dgl-enter config` stage
Try `dgl-enter config nodepred --data csv --model sage --cfg csv_sage.yml`, to use SAGE model for your dataset. You'll see the data part is now the configuration related to CSV dataset. `data_path` is used to specify the data folder, and `./` means the current folder.
If your dataset doesn't have the builtin split on the nodes for train/val/test, you need to manually set the split ratio in the config yml file, DGL will random generate the split for you.
```yml
data:
name: csv
split_ratio: # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
data_path: ./ # metadata.yaml, nodes.csv, edges.csv should in this folder
```
Step 3: `train` the model/`export` the script
Then you can do the same as the tutorial above, either train the model by `dgl-eneter train --cfg csv_sage.yaml` or use `dgl-enter export --cfg csv_sage.yml --output my_dataset.py` to get the training script.
## API Referencce
DGL enter is a new tool for user to bootstrap datasets and common models.
The entry point of enter is `dgl-enter`, and it has three subcommand `config`, `train` and `export`.
### Config
The config stage is to generate a configuration file on the specific pipeline.
`dgl-enter` currently provides 3 pipelines:
- nodepred (Node prediction tasks, suitable for small dataset to prototype)
- nodepred-ns (Node prediction tasks with sampling method, suitable for medium and large dataset)
- linkpred (Link prediction tasks, to predict whether edge exists among node pairs based on node features)
You can get the full list by `dgl-enter config --help`
```
Usage: dgl-enter config [OPTIONS] COMMAND [ARGS]...
Generate the config files
Options:
--help Show this message and exit.
Commands:
linkpred Link prediction pipeline
nodepred Node classification pipeline
nodepred-ns Node classification sampling pipeline
```
For each pipeline it will have diffirent options to specified. For example, for node prediction pipeline, you can do `dgl-enter config nodepred --help`, you'll get:
```
Usage: dgl-enter config nodepred [OPTIONS]
Node classification pipeline
Options:
--data [cora|citeseer|ogbl-collab|csv|reddit|co-buy-computer]
input data name [required]
--cfg TEXT output configuration path [default:
cfg.yml]
--model [gcn|gat|sage|sgc|gin] Model name [required]
--device [cpu|cuda] Device, cpu or cuda [default: cpu]
--help Show this message and exit.
```
You can always get the detailed help information by adding `--help` to the command line
### Train
You can train a model on the dataset based on the configuration file generated by `dgl-enter config`, by `dgl-enter train`.
```
Usage: dgl-enter train [OPTIONS]
Train the model
Options:
--cfg TEXT yaml file name [default: cfg.yml]
--help Show this message and exit.
```
### Export
Get the self-contained, runnable python script derived from the configuration file by `dgl-enter export`.
from .cli import app
if __name__ == "__main__":
app()
\ No newline at end of file
import typer
from ..pipeline import *
from ..model import *
from .config_cli import config_app
from .train_cli import train
from .export_cli import export
no_args_is_help = False
app = typer.Typer(no_args_is_help=no_args_is_help, add_completion=False)
app.add_typer(config_app, name="config", no_args_is_help=no_args_is_help)
app.command(help="Train the model", no_args_is_help=no_args_is_help)(train)
app.command(help="Export the python file from config", no_args_is_help=no_args_is_help)(export)
def main():
app()
if __name__ == "__main__":
app()
\ No newline at end of file
from ..pipeline import *
from ..utils.factory import ModelFactory, PipelineFactory
import typer
from enum import Enum
import typing
import yaml
from pathlib import Path
config_app = typer.Typer(help="Generate the config files")
for key, pipeline in PipelineFactory.registry.items():
config_app.command(key, help=pipeline.get_description())(pipeline.get_cfg_func())
if __name__ == "__main__":
config_app()
\ No newline at end of file
from ..utils.factory import ModelFactory, PipelineFactory
from ..utils.enter_config import UserConfig
import typer
from enum import Enum
import typing
import yaml
from pathlib import Path
import isort
import autopep8
def export(
cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
output: str = typer.Option("output.py", help="output python file name")
):
user_cfg = yaml.safe_load(Path(cfg).open("r"))
pipeline_name = user_cfg["pipeline_name"]
output_file_content = PipelineFactory.registry[pipeline_name].gen_script(user_cfg)
f_code = autopep8.fix_code(output_file_content, options={'aggressive': 1})
f_code = isort.code(f_code)
with open(output, "w") as f:
f.write(f_code)
print("The python script is generated at {}, based on config file {}".format(Path(output).absolute(), Path(cfg).absolute()))
if __name__ == "__main__":
export_app = typer.Typer()
export_app.command()(export)
export_app()
\ No newline at end of file
from ..utils.factory import ModelFactory, PipelineFactory
from ..utils.enter_config import UserConfig
import typer
from enum import Enum
import typing
import yaml
from pathlib import Path
import isort
import autopep8
def train(
cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
):
user_cfg = yaml.safe_load(Path(cfg).open("r"))
pipeline_name = user_cfg["pipeline_name"]
output_file_content = PipelineFactory.registry[pipeline_name].gen_script(user_cfg)
f_code = autopep8.fix_code(output_file_content, options={'aggressive': 1})
f_code = isort.code(f_code)
exec(f_code, {'__name__': '__main__'})
if __name__ == "__main__":
train_app = typer.Typer()
train_app.command()(train)
train_app()
\ No newline at end of file
from .node_encoder import *
from .edge_encoder import *
\ No newline at end of file
from ...utils.factory import EdgeModelFactory
from .ele import ElementWiseProductPredictor
from .bilinear import BilinearPredictor
EdgeModelFactory.register("ele")(ElementWiseProductPredictor)
EdgeModelFactory.register("bilinear")(BilinearPredictor)
\ No newline at end of file
import torch
import torch.nn as nn
import torch.nn.functional as F
class BilinearPredictor(nn.Module):
def __init__(self,
data_info: dict,
hidden_size: int = 32,
num_layers: int = 1,
bias: bool = True):
"""Bilinear product model for edge scores
Parameters
----------
data_info : dict
The information about the input dataset.
hidden_size : int
Hidden size.
num_layers : int
Number of hidden layers.
bias : bool
Whether to use bias in the linaer layer.
"""
super(BilinearPredictor, self).__init__()
in_size, out_size = data_info["in_size"], data_info["out_size"]
self.bilinear = nn.Bilinear(in_size, in_size, hidden_size, bias=bias)
lins_list = []
for _ in range(num_layers-2):
lins_list.append(nn.Linear(hidden_size, hidden_size, bias=bias))
lins_list.append(nn.ReLU())
lins_list.append(nn.Linear(hidden_size, out_size, bias=bias))
self.linear = nn.Sequential(*lins_list)
def forward(self, h_src, h_dst):
h = self.bilinear(h_src, h_dst)
h = self.linear(h)
h = torch.sigmoid(h)
return h
import torch
import torch.nn as nn
import torch.nn.functional as F
class DotPredictor(nn.Module):
def __init__(self,
in_size: int = -1,
out_size: int = 1,
hidden_size: int = 256,
num_layers: int = 3,
bias: bool = False):
super(DotPredictor, self).__init__()
lins_list = []
for _ in range(num_layers-2):
lins_list.append(nn.Linear(in_size, hidden_size, bias=bias))
lins_list.append(nn.ReLU())
lins_list.append(nn.Linear(hidden_size, out_size, bias=bias))
self.linear = nn.Sequential(*lins_list)
def forward(self, h_src, h_dst):
h = h_src * h_dst
h = self.linear(h)
h = torch.sigmoid(h)
return h
import torch
import torch.nn as nn
import torch.nn.functional as F
class ElementWiseProductPredictor(nn.Module):
def __init__(self,
data_info: dict,
hidden_size: int = 64,
num_layers: int = 2,
bias: bool = True):
"""Elementwise product model for edge scores
Parameters
----------
data_info : dict
The information about the input dataset.
hidden_size : int
Hidden size.
num_layers : int
Number of hidden layers.
bias : bool
Whether to use bias in the linaer layer.
"""
super(ElementWiseProductPredictor, self).__init__()
lins_list = []
in_size, out_size = data_info["in_size"], data_info["out_size"]
for i in range(num_layers):
in_hiddnen = in_size if i == 0 else hidden_size
out_hidden = hidden_size if i < num_layers - 1 else out_size
lins_list.append(nn.Linear(in_hiddnen, out_hidden, bias=bias))
if i < num_layers - 1:
lins_list.append(nn.ReLU())
self.linear = nn.Sequential(*lins_list)
def forward(self, h_src, h_dst):
h = h_src * h_dst
h = self.linear(h)
h = torch.sigmoid(h)
return h
from .gcn import GCN
from .gat import GAT
from .sage import GraphSAGE
from .sgc import SGC
from .gin import GIN
from ...utils.factory import NodeModelFactory
NodeModelFactory.register("gcn")(GCN)
NodeModelFactory.register("gat")(GAT)
NodeModelFactory.register("sage")(GraphSAGE)
NodeModelFactory.register("sgc")(SGC)
NodeModelFactory.register("gin")(GIN)
from typing import List
import torch
import torch.nn as nn
import dgl.function as fn
import torch.nn.functional as F
from dgl.nn import GATConv
from dgl.base import dgl_warning
class GAT(nn.Module):
def __init__(self,
data_info: dict,
embed_size: int = -1,
num_layers: int = 2,
hidden_size: int = 8,
heads: List[int] = [8, 8],
activation: str = "elu",
feat_drop: float = 0.6,
attn_drop: float = 0.6,
negative_slope: float = 0.2,
residual: bool = False):
"""Graph Attention Networks
Parameters
----------
data_info : dict
The information about the input dataset.
embed_size : int
The dimension of created embedding table. -1 means using original node embedding
hidden_size : int
Hidden size.
num_layers : int
Number of layers.
norm : str
GCN normalization type. Can be 'both', 'right', 'left', 'none'.
activation : str
Activation function.
feat_drop : float
Dropout rate for features.
attn_drop : float
Dropout rate for attentions.
negative_slope: float
Negative slope for leaky relu in GATConv
residual : bool
If true, the GATConv will use residule connection
"""
super(GAT, self).__init__()
self.data_info = data_info
self.embed_size = embed_size
self.num_layers = num_layers
self.gat_layers = nn.ModuleList()
self.activation = getattr(torch.nn.functional, activation)
if embed_size > 0:
self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
in_size = embed_size
else:
in_size = data_info["in_size"]
for i in range(num_layers):
in_hidden = hidden_size*heads[i-1] if i > 0 else in_size
out_hidden = hidden_size if i < num_layers - \
1 else data_info["out_size"]
use_residual = i == num_layers
activation = None if i == num_layers else self.activation
self.gat_layers.append(GATConv(
in_hidden, out_hidden, heads[i],
feat_drop, attn_drop, negative_slope, use_residual, activation))
def forward(self, graph, node_feat, edge_feat=None):
if self.embed_size > 0:
dgl_warning(
"The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
h = self.embed.weight
else:
h = node_feat
for l in range(self.num_layers - 1):
h = self.gat_layers[l](graph, h).flatten(1)
# output projection
logits = self.gat_layers[-1](graph, h).mean(1)
return logits
def forward_block(self, blocks, node_feat, edge_feat=None):
h = node_feat
for l in range(self.num_layers - 1):
h = self.gat_layers[l](blocks[l], h).flatten(1)
logits = self.gat_layers[-1](blocks[-1], h).mean(1)
return logits
import torch
import torch.nn as nn
import dgl
from dgl.base import dgl_warning
class GCN(nn.Module):
def __init__(self,
data_info: dict,
embed_size: int = -1,
hidden_size: int = 16,
num_layers: int = 1,
norm: str = "both",
activation: str = "relu",
dropout: float = 0.5,
use_edge_weight: bool = False):
"""Graph Convolutional Networks
Parameters
----------
data_info : dict
The information about the input dataset.
embed_size : int
The dimension of created embedding table. -1 means using original node embedding
hidden_size : int
Hidden size.
num_layers : int
Number of layers.
norm : str
GCN normalization type. Can be 'both', 'right', 'left', 'none'.
activation : str
Activation function.
dropout : float
Dropout rate.
use_edge_weight : bool
If true, scale the messages by edge weights.
"""
super().__init__()
self.use_edge_weight = use_edge_weight
self.data_info = data_info
self.embed_size = embed_size
self.layers = nn.ModuleList()
if embed_size > 0:
self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
in_size = embed_size
else:
in_size = data_info["in_size"]
for i in range(num_layers):
in_hidden = hidden_size if i > 0 else in_size
out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]
self.layers.append(dgl.nn.GraphConv(in_hidden, out_hidden, norm=norm))
self.dropout = nn.Dropout(p=dropout)
self.act = getattr(torch, activation)
def forward(self, g, node_feat, edge_feat = None):
if self.embed_size > 0:
dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
h = self.embed.weight
else:
h = node_feat
edge_weight = edge_feat if self.use_edge_weight else None
for l, layer in enumerate(self.layers):
h = layer(g, h, edge_weight=edge_weight)
if l != len(self.layers) -1:
h = self.act(h)
h = self.dropout(h)
return h
def forward_block(self, blocks, node_feat, edge_feat = None):
h = node_feat
edge_weight = edge_feat if self.use_edge_weight else None
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h, edge_weight=edge_weight)
if l != len(self.layers) - 1:
h = self.act(h)
h = self.dropout(h)
return h
import torch.nn as nn
from dgl.nn import GINConv
from dgl.base import dgl_warning
class GIN(nn.Module):
def __init__(self,
data_info: dict,
embed_size: int = -1,
hidden_size=64,
num_layers=3,
aggregator_type='sum'):
"""Graph Isomophism Networks
Parameters
----------
data_info : dict
The information about the input dataset.
embed_size : int
The dimension of created embedding table. -1 means using original node embedding
hidden_size : int
Hidden size.
num_layers : int
Number of layers.
aggregator_type : str
Aggregator type to use (``sum``, ``max`` or ``mean``), default: 'sum'.
"""
super().__init__()
self.data_info = data_info
self.embed_size = embed_size
self.conv_list = nn.ModuleList()
self.num_layers = num_layers
if embed_size > 0:
self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
in_size = embed_size
else:
in_size = data_info["in_size"]
for i in range(num_layers):
input_dim = in_size if i == 0 else hidden_size
mlp = nn.Sequential(nn.Linear(input_dim, hidden_size),
nn.BatchNorm1d(hidden_size), nn.ReLU(),
nn.Linear(hidden_size, hidden_size), nn.ReLU())
self.conv_list.append(GINConv(mlp, aggregator_type, 1e-5, True))
self.out_mlp = nn.Linear(hidden_size, self.out_size)
def forward(self, graph, node_feat, edge_feat=None):
if self.embed_size > 0:
dgl_warning(
"The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
h = self.embed.weight
else:
h = node_feat
for i in range(self.num_layers):
h = self.conv_list[i](graph, h)
h = self.out_mlp(h)
return h
import torch.nn as nn
import dgl
from dgl.base import dgl_warning
class GraphSAGE(nn.Module):
def __init__(self,
data_info: dict,
embed_size: int = -1,
hidden_size: int = 16,
num_layers: int = 1,
activation: str = "relu",
dropout: float = 0.5,
aggregator_type: str = "gcn"):
"""GraphSAGE model
Parameters
----------
data_info : dict
The information about the input dataset.
embed_size : int
The dimension of created embedding table. -1 means using original node embedding
hidden_size : int
Hidden size.
num_layers : int
Number of hidden layers.
dropout : float
Dropout rate.
activation : str
Activation function name under torch.nn.functional
aggregator_type : str
Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
"""
super(GraphSAGE, self).__init__()
self.data_info = data_info
self.embed_size = embed_size
if embed_size > 0:
self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
in_size = embed_size
else:
in_size = data_info["in_size"]
self.layers = nn.ModuleList()
self.dropout = nn.Dropout(dropout)
self.activation = getattr(nn.functional, activation)
for i in range(num_layers):
in_hidden = hidden_size if i > 0 else in_size
out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]
self.layers.append(dgl.nn.SAGEConv(in_hidden, out_hidden, aggregator_type))
def forward(self, graph, node_feat, edge_feat = None):
if self.embed_size > 0:
dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
h = self.embed.weight
else:
h = node_feat
h = self.dropout(h)
for l, layer in enumerate(self.layers):
h = layer(graph, h)
if l != len(self.layers) - 1:
h = self.activation(h)
h = self.dropout(h)
return h
def forward_block(self, blocks, node_feat, edge_feat = None):
h = node_feat
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
if l != len(self.layers) - 1:
h = self.activation(h)
h = self.dropout(h)
return h
import torch
import torch.nn as nn
import dgl.function as fn
import torch.nn.functional as F
from dgl.nn import SGConv
from dgl.base import dgl_warning
class SGC(nn.Module):
def __init__(self,
data_info: dict,
embed_size: int = -1,
bias=True, k=2):
""" Simplifying Graph Convolutional Networks
Parameters
----------
data_info : dict
The information about the input dataset.
embed_size : int
The dimension of created embedding table. -1 means using original node embedding
bias : bool
If True, adds a learnable bias to the output. Default: ``True``.
k : int
Number of hops :math:`K`. Defaults:``1``.
"""
super().__init__()
self.data_info = data_info
self.out_size = data_info["out_size"]
self.in_size = data_info["in_size"]
self.embed_size = embed_size
if embed_size > 0:
self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
self.sgc = SGConv(self.in_size, self.out_size, k=k, cached=True,
bias=bias, norm=self.normalize)
def forward(self, g, node_feat, edge_feat=None):
if self.embed_size > 0:
dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
h = self.embed.weight
else:
h = node_feat
return self.sgc(g, h)
@staticmethod
def normalize(h):
return (h-h.mean(0))/(h.std(0) + 1e-5)
from .nodepred import NodepredPipeline
from .nodepred_sample import NodepredNsPipeline
from .linkpred import LinkpredPipeline
\ No newline at end of file
from .gen import *
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment