DGL Enter (#3690)

* add * fix * fix * fix * fix * add * add * fix * fix * fix * new loader * fix * fix * fix for 3.6 * fix * add * add receipes and also some bug fixes * fix * fix * fix * fix receipies * allow AsNodeDataset to work on ogb * add ut * many fixes for nodepred-ns pipeline * receipe for nodepred-ns * Update enter/README.md Co-authored-by: Zihao Ye <zihaoye.cs@gmail.com> * fix layers * fix * fix * fix * fix * fix multiple issues * fix for citation2 * fix comment * fix * fix * clean up * fix Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <minjie.wang@nyu.edu> Co-authored-by: Zihao Ye <zihaoye.cs@gmail.com>

DGL Enter (#3690)
* add * fix * fix * fix * fix * add * add * fix * fix * fix * new loader * fix * fix * fix for 3.6 * fix * add * add receipes and also some bug fixes * fix * fix * fix * fix receipies * allow AsNodeDataset to work on ogb * add ut * many fixes for nodepred-ns pipeline * receipe for nodepred-ns * Update enter/README.md Co-authored-by: Zihao Ye <zihaoye.cs@gmail.com> * fix layers * fix * fix * fix * fix * fix multiple issues * fix for citation2 * fix comment * fix * fix * clean up * fix Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <minjie.wang@nyu.edu> Co-authored-by: Zihao Ye <zihaoye.cs@gmail.com>
539335ce · Jinjing Zhou · GitHub · 80fb4dbe · 539335ce · 539335ce
Unverified Commit 539335ce authored Feb 18, 2022 by Jinjing Zhou Committed by GitHub Feb 18, 2022
20 changed files
--- a/enter/README.md
+++ b/enter/README.md
+# DGL-Enter
+(What is DGL-Enter? Why design this? What is it for?)
+DGL-Enter is a commanline tool for user to quickly bootstrap models with multiple datasets. And provide full capability for user to customize the pipeline into their own takks.
+## Installation guide
+You can install DGL-enter easily by `pip install dglenter`. Then you should be able to use DGL-Enter in you commandline tool by type in `dgl-enter`
+```
+Usage: dgl-enter [OPTIONS] COMMAND [ARGS]...
+Options:
+  --help  Show this message and exit.
+Commands:
+  config  Generate the config files
+  export  Export the python file from config
+  train   Train the model
+```
+## Train GraphSAGE on Cora from scratch
+Here we'll use one of the most classic model GraphSAGE and Cora citation graph dataset as an example, to show how easy to train a model with DGL-Enter.
+### Step 1: Use `dgl-enter config` to generate a yaml configuration file
+Run `dgl-enter config nodepred --data cora --model sage --cfg cora_sage.yml`. Then you'll get a configuration file `cora_sage.yml` includes all the configuration to be tuned, with the comments
+Optionally, You can change the config as you want to acheive a better performance. Below is a modified sample based on the template generated by the command above.
+The early stop part is removed for simplicity
+```yaml
+version: 0.0.1
+pipeline_name: nodepred
+device: cpu
+data:
+  name: cora
+  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
+model:
+  name: sage
+  embed_size: -1              # The dimension of created embedding table. -1 means using original node embedding
+  hidden_size: 16             # Hidden size.
+  num_layers: 1               # Number of hidden layers.
+  activation: relu            # Activation function name under torch.nn.functional
+  dropout: 0.5                # Dropout rate.
+  aggregator_type: gcn        # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
+general_pipeline:
+  num_epochs: 200             # Number of training epochs
+  eval_period: 5              # Interval epochs between evaluations
+  optimizer:
+    name: Adam
+    lr: 0.01
+    weight_decay: 0.0005
+  loss: CrossEntropyLoss
+  num_runs: 1                 # Number of experiments to run
+```
+### Step 2: Use `dgl-enter train` to initiate the training process.   
+Simply run `dgl-enter train --cfg cora_sage.yml` will start the training process
+```log
+...
+Epoch 00190 | Loss 1.5225 | TrainAcc 0.9500 | ValAcc 0.6840
+Epoch 00191 | Loss 1.5416 | TrainAcc 0.9357 | ValAcc 0.6840
+Epoch 00192 | Loss 1.5391 | TrainAcc 0.9357 | ValAcc 0.6840
+Epoch 00193 | Loss 1.5257 | TrainAcc 0.9643 | ValAcc 0.6840
+Epoch 00194 | Loss 1.5196 | TrainAcc 0.9286 | ValAcc 0.6840
+EarlyStopping counter: 12 out of 20
+Epoch 00195 | Loss 1.4862 | TrainAcc 0.9643 | ValAcc 0.6760
+Epoch 00196 | Loss 1.5142 | TrainAcc 0.9714 | ValAcc 0.6760
+Epoch 00197 | Loss 1.5145 | TrainAcc 0.9714 | ValAcc 0.6760
+Epoch 00198 | Loss 1.5174 | TrainAcc 0.9571 | ValAcc 0.6760
+Epoch 00199 | Loss 1.5235 | TrainAcc 0.9714 | ValAcc 0.6760
+Test Accuracy 0.7740
+Accuracy across 1 runs: 0.774 ± 0.0
+```
+That's all! Basically you only need two line of command to train a graph neural network.
+## Debug your model and advanced customization
+That's not everything yet. We belive you may want to change more than the configuration files, to change the training pipeline, calculate new metrics, or look into the code for details.
+DGL-Enter can export a self-contained, runnable python script for you to do anything you like. 
+Try `dgl-enter export --cfg cora_sage.yml --output script.py`, and you'll get the script used to train the model, like a magic!
+Below 
+```python
+...
+def train(cfg, pipeline_cfg, device, data, model, optimizer, loss_fcn):
+    g = data[0]  # Only train on the first graph
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    g = g.to(device)
+    node_feat = g.ndata.get('feat', None)
+    edge_feat = g.edata.get('feat', None)
+    label = g.ndata['label']
+    train_mask, val_mask, test_mask = g.ndata['train_mask'].bool(
+    ), g.ndata['val_mask'].bool(), g.ndata['test_mask'].bool()
+    val_acc = 0.
+    for epoch in range(pipeline_cfg['num_epochs']):
+        model.train()
+        logits = model(g, node_feat, edge_feat)
+        loss = loss_fcn(logits[train_mask], label[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        train_acc = accuracy(logits[train_mask], label[train_mask])
+        if epoch != 0 and epoch % pipeline_cfg['eval_period'] == 0:
+            val_acc = accuracy(logits[val_mask], label[val_mask])
+        print("Epoch {:05d} | Loss {:.4f} | TrainAcc {:.4f} | ValAcc {:.4f}".
+              format(epoch, loss.item(), train_acc, val_acc))
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, node_feat, edge_feat)
+        test_acc = accuracy(logits[test_mask], label[test_mask])
+    return test_acc
+def main():
+    cfg = {
+        'version': '0.0.1',
+        'device': 'cpu',
+        'data': {
+            'split_ratio': None},
+        'model': {
+            'embed_size': -1,
+            'hidden_size': 16,
+            'num_layers': 1,
+            'activation': 'relu',
+            'dropout': 0.5,
+            'aggregator_type': 'gcn'},
+        'general_pipeline': {
+            'num_epochs': 200,
+            'eval_period': 5,
+            'optimizer': {
+                'lr': 0.01,
+                'weight_decay': 0.0005},
+            'loss': 'CrossEntropyLoss',
+            'num_runs': 1}}
+    device = cfg['device']
+    pipeline_cfg = cfg['general_pipeline']
+    # load data
+    data = AsNodePredDataset(CoraGraphDataset())
+    # create model
+    model_cfg = cfg["model"]
+    cfg["model"]["data_info"] = {
+        "in_size": model_cfg['embed_size'] if model_cfg['embed_size'] > 0 else data[0].ndata['feat'].shape[1],
+        "out_size": data.num_classes,
+        "num_nodes": data[0].num_nodes()
+    }
+    model = GraphSAGE(**cfg["model"])
+    model = model.to(device)
+    loss = torch.nn.CrossEntropyLoss()
+    optimizer = torch.optim.Adam(
+        model.parameters(),
+        **pipeline_cfg["optimizer"])
+    # train
+    test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
+    return test_acc
+...
+```
+## Recipes
+We've prepared a set of finetuned config under `enter/recipes`, that you can try easily to get a reproducable result.
+For example, using GCN with pubmet dataset, you can use `enter/recipes/nodepred_pubmed_gcn.yml`. 
+To try it, type in `dgl-enter train --cfg recipes/nodepred_pubmed_gcn.yml` to train the model, or `dgl-enter export --cfg recipes/nodepred_pubmed_gcn.yml` to get the full training script.
+## Use DGL-Enter on your own dataset
+You can modify the generated script in anyway you want. However, we also provided an end2end way to use your own dataset, by using our `CSVDataset`. 
+Step 1: Prepare your csv and metadata file.
+Following the tutorial at [Loading data from CSV files](https://docs.dgl.ai/en/latest/guide/data-loadcsv.html#guide-data-pipeline-loadcsv`), Prepare your own CSV dataset includes three files minimally, node data csv, edge data csv and the meta data file (meta.yml).
+```yml
+dataset_name: my_csv_dataset
+edge_data:
+- file_name: edges.csv
+node_data:
+- file_name: nodes.csv
+```
+Step 2: Choose to csv dataset in the `dgl-enter config` stage
+Try `dgl-enter config nodepred --data csv --model sage --cfg csv_sage.yml`, to use SAGE model for your dataset. You'll see the data part is now the configuration related to CSV dataset. `data_path` is used to specify the data folder, and `./` means the current folder. 
+If your dataset doesn't have the builtin split on the nodes for train/val/test, you need to manually set the split ratio in the config yml file, DGL will random generate the split for you.
+```yml
+data:
+  name: csv
+  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
+  data_path: ./               # metadata.yaml, nodes.csv, edges.csv should in this folder
+```
+Step 3: `train` the model/`export` the script
+Then you can do the same as the tutorial above, either train the model by `dgl-eneter train --cfg csv_sage.yaml` or use `dgl-enter export --cfg csv_sage.yml --output my_dataset.py` to get the training script.
+## API Referencce
+DGL enter is a new tool for user to bootstrap datasets and common models.
+The entry point of enter is `dgl-enter`, and it has three subcommand `config`, `train` and `export`.
+### Config
+The config stage is to generate a configuration file on the specific pipeline.
+`dgl-enter` currently provides 3 pipelines:
+- nodepred (Node prediction tasks, suitable for small dataset to prototype)
+- nodepred-ns (Node prediction tasks with sampling method, suitable for medium and large dataset)
+- linkpred (Link prediction tasks, to predict whether edge exists among node pairs based on node features)
+You can get the full list by `dgl-enter config --help`
+```
+Usage: dgl-enter config [OPTIONS] COMMAND [ARGS]...
+  Generate the config files
+Options:
+  --help  Show this message and exit.
+Commands:
+  linkpred     Link prediction pipeline
+  nodepred     Node classification pipeline
+  nodepred-ns  Node classification sampling pipeline
+```
+For each pipeline it will have diffirent options to specified. For example, for node prediction pipeline, you can do `dgl-enter config nodepred --help`, you'll get:
+```
+Usage: dgl-enter config nodepred [OPTIONS]
+  Node classification pipeline
+Options:
+  --data [cora|citeseer|ogbl-collab|csv|reddit|co-buy-computer]
+                                  input data name  [required]
+  --cfg TEXT                      output configuration path  [default:
+                                  cfg.yml]
+  --model [gcn|gat|sage|sgc|gin]  Model name  [required]
+  --device [cpu|cuda]             Device, cpu or cuda  [default: cpu]
+  --help                          Show this message and exit.
+```
+You can always get the detailed help information by adding `--help` to the command line
+### Train
+You can train a model on the dataset based on the configuration file generated by `dgl-enter config`, by `dgl-enter train`.
+```
+Usage: dgl-enter train [OPTIONS]
+  Train the model
+Options:
+  --cfg TEXT  yaml file name  [default: cfg.yml]
+  --help      Show this message and exit.
+```
+### Export
+Get the self-contained, runnable python script derived from the configuration file by `dgl-enter export`.
--- a/enter/dglenter/__init__.py
+++ b/enter/dglenter/__init__.py
--- a/enter/dglenter/cli/__init__.py
+++ b/enter/dglenter/cli/__init__.py
+from .cli import app
+if __name__ == "__main__":
+    app()
\ No newline at end of file
--- a/enter/dglenter/cli/cli.py
+++ b/enter/dglenter/cli/cli.py
+import typer
+from ..pipeline import *
+from ..model import *
+from .config_cli import config_app
+from .train_cli import train
+from .export_cli import export
+no_args_is_help = False
+app = typer.Typer(no_args_is_help=no_args_is_help, add_completion=False)
+app.add_typer(config_app, name="config", no_args_is_help=no_args_is_help)
+app.command(help="Train the model", no_args_is_help=no_args_is_help)(train)
+app.command(help="Export the python file from config", no_args_is_help=no_args_is_help)(export)
+def main():
+    app()
+if __name__ == "__main__":
+    app()
\ No newline at end of file
--- a/enter/dglenter/cli/config_cli.py
+++ b/enter/dglenter/cli/config_cli.py
+from ..pipeline import *
+from ..utils.factory import ModelFactory, PipelineFactory
+import typer
+from enum import Enum
+import typing
+import yaml
+from pathlib import Path
+config_app = typer.Typer(help="Generate the config files")
+for key, pipeline in PipelineFactory.registry.items():
+    config_app.command(key, help=pipeline.get_description())(pipeline.get_cfg_func())
+if __name__ == "__main__":
+    config_app()
\ No newline at end of file
--- a/enter/dglenter/cli/export_cli.py
+++ b/enter/dglenter/cli/export_cli.py
+from ..utils.factory import ModelFactory, PipelineFactory
+from ..utils.enter_config import UserConfig
+import typer
+from enum import Enum
+import typing
+import yaml
+from pathlib import Path
+import isort
+import autopep8
+def export(
+    cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
+    output: str = typer.Option("output.py", help="output python file name")
+):
+    user_cfg = yaml.safe_load(Path(cfg).open("r"))
+    pipeline_name = user_cfg["pipeline_name"]
+    output_file_content = PipelineFactory.registry[pipeline_name].gen_script(user_cfg)
+    f_code = autopep8.fix_code(output_file_content, options={'aggressive': 1})
+    f_code = isort.code(f_code)
+    with open(output, "w") as f:
+        f.write(f_code)
+    print("The python script is generated at {}, based on config file {}".format(Path(output).absolute(), Path(cfg).absolute()))
+if __name__ == "__main__":
+    export_app = typer.Typer()
+    export_app.command()(export)
+    export_app()
\ No newline at end of file
--- a/enter/dglenter/cli/train_cli.py
+++ b/enter/dglenter/cli/train_cli.py
+from ..utils.factory import ModelFactory, PipelineFactory
+from ..utils.enter_config import UserConfig
+import typer
+from enum import Enum
+import typing
+import yaml
+from pathlib import Path
+import isort
+import autopep8
+def train(
+    cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
+):
+    user_cfg = yaml.safe_load(Path(cfg).open("r"))
+    pipeline_name = user_cfg["pipeline_name"]
+    output_file_content = PipelineFactory.registry[pipeline_name].gen_script(user_cfg)
+    f_code = autopep8.fix_code(output_file_content, options={'aggressive': 1})
+    f_code = isort.code(f_code)
+    exec(f_code,  {'__name__': '__main__'})
+if __name__ == "__main__":
+    train_app = typer.Typer()
+    train_app.command()(train)
+    train_app()
\ No newline at end of file
--- a/enter/dglenter/model/__init__.py
+++ b/enter/dglenter/model/__init__.py
+from .node_encoder import *
+from .edge_encoder import *
\ No newline at end of file
--- a/enter/dglenter/model/edge_encoder/__init__.py
+++ b/enter/dglenter/model/edge_encoder/__init__.py
+from ...utils.factory import EdgeModelFactory
+from .ele import ElementWiseProductPredictor
+from .bilinear import BilinearPredictor
+EdgeModelFactory.register("ele")(ElementWiseProductPredictor)
+EdgeModelFactory.register("bilinear")(BilinearPredictor)
\ No newline at end of file
--- a/enter/dglenter/model/edge_encoder/bilinear.py
+++ b/enter/dglenter/model/edge_encoder/bilinear.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class BilinearPredictor(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 hidden_size: int = 32,
+                 num_layers: int = 1,
+                 bias: bool = True):
+        """Bilinear product model for edge scores
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        hidden_size : int
+            Hidden size.        
+        num_layers : int
+            Number of hidden layers.
+        bias : bool
+            Whether to use bias in the linaer layer.
+        """
+        super(BilinearPredictor, self).__init__()
+        in_size, out_size = data_info["in_size"], data_info["out_size"]
+        self.bilinear = nn.Bilinear(in_size, in_size, hidden_size, bias=bias)
+        lins_list = []
+        for _ in range(num_layers-2):
+            lins_list.append(nn.Linear(hidden_size, hidden_size, bias=bias))
+            lins_list.append(nn.ReLU())
+        lins_list.append(nn.Linear(hidden_size, out_size, bias=bias))
+        self.linear = nn.Sequential(*lins_list)
+    def forward(self, h_src, h_dst):
+        h = self.bilinear(h_src, h_dst)
+        h = self.linear(h)
+        h = torch.sigmoid(h)
+        return h
--- a/enter/dglenter/model/edge_encoder/dot.py
+++ b/enter/dglenter/model/edge_encoder/dot.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class DotPredictor(nn.Module):
+    def __init__(self,
+                 in_size: int = -1,
+                 out_size: int = 1,
+                 hidden_size: int = 256,
+                 num_layers: int = 3,
+                 bias: bool = False):
+        super(DotPredictor, self).__init__()
+        lins_list = []
+        for _ in range(num_layers-2):
+            lins_list.append(nn.Linear(in_size, hidden_size, bias=bias))
+            lins_list.append(nn.ReLU())
+        lins_list.append(nn.Linear(hidden_size, out_size, bias=bias))
+        self.linear = nn.Sequential(*lins_list)
+    def forward(self, h_src, h_dst):
+        h = h_src * h_dst
+        h = self.linear(h)
+        h = torch.sigmoid(h)
+        return h
--- a/enter/dglenter/model/edge_encoder/ele.py
+++ b/enter/dglenter/model/edge_encoder/ele.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class ElementWiseProductPredictor(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 hidden_size: int = 64,
+                 num_layers: int = 2,
+                 bias: bool = True):
+        """Elementwise product model for edge scores
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        hidden_size : int
+            Hidden size.        
+        num_layers : int
+            Number of hidden layers.
+        bias : bool
+            Whether to use bias in the linaer layer.
+        """
+        super(ElementWiseProductPredictor, self).__init__()
+        lins_list = []
+        in_size, out_size = data_info["in_size"], data_info["out_size"]
+        for i in range(num_layers):
+            in_hiddnen = in_size if i == 0 else hidden_size
+            out_hidden = hidden_size if i < num_layers - 1 else out_size
+            lins_list.append(nn.Linear(in_hiddnen, out_hidden, bias=bias))
+            if i < num_layers - 1:
+                lins_list.append(nn.ReLU())
+        self.linear = nn.Sequential(*lins_list)
+    def forward(self, h_src, h_dst):
+        h = h_src * h_dst
+        h = self.linear(h)
+        h = torch.sigmoid(h)
+        return h
--- a/enter/dglenter/model/node_encoder/__init__.py
+++ b/enter/dglenter/model/node_encoder/__init__.py
+from .gcn import GCN
+from .gat import GAT
+from .sage import GraphSAGE
+from .sgc import SGC
+from .gin import GIN
+from ...utils.factory import NodeModelFactory
+NodeModelFactory.register("gcn")(GCN)
+NodeModelFactory.register("gat")(GAT)
+NodeModelFactory.register("sage")(GraphSAGE)
+NodeModelFactory.register("sgc")(SGC)
+NodeModelFactory.register("gin")(GIN)
--- a/enter/dglenter/model/node_encoder/gat.py
+++ b/enter/dglenter/model/node_encoder/gat.py
+from typing import List
+import torch
+import torch.nn as nn
+import dgl.function as fn
+import torch.nn.functional as F
+from dgl.nn import GATConv
+from dgl.base import dgl_warning
+class GAT(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 num_layers: int = 2,
+                 hidden_size: int = 8,
+                 heads: List[int] = [8, 8],
+                 activation: str = "elu",
+                 feat_drop: float = 0.6,
+                 attn_drop: float = 0.6,
+                 negative_slope: float = 0.2,
+                 residual: bool = False):
+        """Graph Attention Networks
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        hidden_size : int
+            Hidden size.
+        num_layers : int
+            Number of layers.
+        norm : str
+            GCN normalization type. Can be 'both', 'right', 'left', 'none'.
+        activation : str
+            Activation function.
+        feat_drop : float
+            Dropout rate for features.
+        attn_drop : float
+            Dropout rate for attentions.
+        negative_slope: float
+            Negative slope for leaky relu in GATConv
+        residual : bool
+            If true, the GATConv will use residule connection
+        """
+        super(GAT, self).__init__()
+        self.data_info = data_info
+        self.embed_size = embed_size
+        self.num_layers = num_layers
+        self.gat_layers = nn.ModuleList()
+        self.activation = getattr(torch.nn.functional, activation)
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+            in_size = embed_size
+        else:
+            in_size = data_info["in_size"]
+        for i in range(num_layers):
+            in_hidden = hidden_size*heads[i-1] if i > 0 else in_size
+            out_hidden = hidden_size if i < num_layers - \
+                1 else data_info["out_size"]
+            use_residual = i == num_layers
+            activation = None if i == num_layers else self.activation
+            self.gat_layers.append(GATConv(
+                in_hidden, out_hidden, heads[i],
+                feat_drop, attn_drop, negative_slope, use_residual, activation))
+    def forward(self, graph, node_feat, edge_feat=None):
+        if self.embed_size > 0:
+            dgl_warning(
+                "The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        for l in range(self.num_layers - 1):
+            h = self.gat_layers[l](graph, h).flatten(1)
+        # output projection
+        logits = self.gat_layers[-1](graph, h).mean(1)
+        return logits
+    def forward_block(self,  blocks, node_feat, edge_feat=None):
+        h = node_feat
+        for l in range(self.num_layers - 1):
+            h = self.gat_layers[l](blocks[l], h).flatten(1)
+        logits = self.gat_layers[-1](blocks[-1], h).mean(1)
+        return logits
--- a/enter/dglenter/model/node_encoder/gcn.py
+++ b/enter/dglenter/model/node_encoder/gcn.py
+import torch
+import torch.nn as nn
+import dgl
+from dgl.base import dgl_warning
+class GCN(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 hidden_size: int = 16,
+                 num_layers: int = 1,
+                 norm: str = "both",
+                 activation: str = "relu",
+                 dropout: float = 0.5,
+                 use_edge_weight: bool = False):
+        """Graph Convolutional Networks
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        hidden_size : int
+            Hidden size.
+        num_layers : int
+            Number of layers.
+        norm : str
+            GCN normalization type. Can be 'both', 'right', 'left', 'none'.
+        activation : str
+            Activation function.
+        dropout : float
+            Dropout rate.
+        use_edge_weight : bool
+            If true, scale the messages by edge weights.
+        """
+        super().__init__()
+        self.use_edge_weight = use_edge_weight
+        self.data_info = data_info
+        self.embed_size = embed_size
+        self.layers = nn.ModuleList()
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+            in_size = embed_size
+        else:
+            in_size = data_info["in_size"]
+        for i in range(num_layers):
+            in_hidden = hidden_size if i > 0 else in_size
+            out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]
+            self.layers.append(dgl.nn.GraphConv(in_hidden, out_hidden, norm=norm))
+        self.dropout = nn.Dropout(p=dropout)
+        self.act = getattr(torch, activation)
+    def forward(self, g, node_feat, edge_feat = None):
+        if self.embed_size > 0:
+            dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        edge_weight = edge_feat if self.use_edge_weight else None
+        for l, layer in enumerate(self.layers):
+            h = layer(g, h, edge_weight=edge_weight)
+            if l != len(self.layers) -1:
+                h = self.act(h)
+                h = self.dropout(h)
+        return h
+    def forward_block(self, blocks, node_feat, edge_feat = None):
+        h = node_feat
+        edge_weight = edge_feat if self.use_edge_weight else None
+        for l, (layer, block) in enumerate(zip(self.layers, blocks)):
+            h = layer(block, h, edge_weight=edge_weight)
+            if l != len(self.layers) - 1:
+                h = self.act(h)
+                h = self.dropout(h)
+        return h
--- a/enter/dglenter/model/node_encoder/gin.py
+++ b/enter/dglenter/model/node_encoder/gin.py
+import torch.nn as nn
+from dgl.nn import GINConv
+from dgl.base import dgl_warning
+class GIN(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 hidden_size=64,
+                 num_layers=3,
+                 aggregator_type='sum'):
+        """Graph Isomophism Networks
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        hidden_size : int
+            Hidden size.
+        num_layers : int
+            Number of layers.
+        aggregator_type : str
+            Aggregator type to use (``sum``, ``max`` or ``mean``), default: 'sum'.
+        """
+        super().__init__()
+        self.data_info = data_info
+        self.embed_size = embed_size
+        self.conv_list = nn.ModuleList()
+        self.num_layers = num_layers
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+            in_size = embed_size
+        else:
+            in_size = data_info["in_size"]
+        for i in range(num_layers):
+            input_dim = in_size if i == 0 else hidden_size
+            mlp = nn.Sequential(nn.Linear(input_dim, hidden_size),
+                                nn.BatchNorm1d(hidden_size), nn.ReLU(),
+                                nn.Linear(hidden_size, hidden_size), nn.ReLU())
+            self.conv_list.append(GINConv(mlp, aggregator_type, 1e-5, True))
+        self.out_mlp = nn.Linear(hidden_size, self.out_size)
+    def forward(self, graph, node_feat, edge_feat=None):
+        if self.embed_size > 0:
+            dgl_warning(
+                "The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        for i in range(self.num_layers):
+            h = self.conv_list[i](graph, h)
+        h = self.out_mlp(h)
+        return h
--- a/enter/dglenter/model/node_encoder/sage.py
+++ b/enter/dglenter/model/node_encoder/sage.py
+import torch.nn as nn
+import dgl
+from dgl.base import dgl_warning
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 hidden_size: int = 16,
+                 num_layers: int = 1,
+                 activation: str = "relu",
+                 dropout: float = 0.5,
+                 aggregator_type: str = "gcn"):        
+        """GraphSAGE model
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        hidden_size : int
+            Hidden size.
+        num_layers : int
+            Number of hidden layers.
+        dropout : float
+            Dropout rate.
+        activation : str
+            Activation function name under torch.nn.functional
+        aggregator_type : str
+            Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
+        """
+        super(GraphSAGE, self).__init__()
+        self.data_info = data_info
+        self.embed_size = embed_size
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+            in_size = embed_size
+        else:
+            in_size = data_info["in_size"]
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = getattr(nn.functional, activation)
+        for i in range(num_layers):
+            in_hidden = hidden_size if i > 0 else in_size
+            out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]
+            self.layers.append(dgl.nn.SAGEConv(in_hidden, out_hidden, aggregator_type))
+    def forward(self, graph, node_feat, edge_feat = None):
+        if self.embed_size > 0:
+            dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        h = self.dropout(h)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+    def forward_block(self, blocks, node_feat, edge_feat = None):
+        h = node_feat
+        for l, (layer, block) in enumerate(zip(self.layers, blocks)):
+            h = layer(block, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
--- a/enter/dglenter/model/node_encoder/sgc.py
+++ b/enter/dglenter/model/node_encoder/sgc.py
+import torch
+import torch.nn as nn
+import dgl.function as fn
+import torch.nn.functional as F
+from dgl.nn import SGConv
+from dgl.base import dgl_warning
+class SGC(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 bias=True, k=2):
+        """ Simplifying Graph Convolutional Networks
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        bias : bool
+            If True, adds a learnable bias to the output. Default: ``True``.
+        k : int
+            Number of hops :math:`K`. Defaults:``1``.
+        """
+        super().__init__()
+        self.data_info = data_info
+        self.out_size = data_info["out_size"]
+        self.in_size = data_info["in_size"]
+        self.embed_size = embed_size
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+        self.sgc = SGConv(self.in_size, self.out_size, k=k, cached=True,
+                          bias=bias, norm=self.normalize)
+    def forward(self, g, node_feat, edge_feat=None):
+        if self.embed_size > 0:
+            dgl_warning("The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        return self.sgc(g, h)
+    @staticmethod
+    def normalize(h):
+        return (h-h.mean(0))/(h.std(0) + 1e-5)
--- a/enter/dglenter/pipeline/__init__.py
+++ b/enter/dglenter/pipeline/__init__.py
+from .nodepred import NodepredPipeline
+from .nodepred_sample import NodepredNsPipeline
+from .linkpred import LinkpredPipeline
\ No newline at end of file
--- a/enter/dglenter/pipeline/linkpred/__init__.py
+++ b/enter/dglenter/pipeline/linkpred/__init__.py
+from .gen import *
\ No newline at end of file