Unverified Commit b4ad59d7 authored by Hengrui Zhang's avatar Hengrui Zhang Committed by GitHub
Browse files

[Model] reformat and update performance of grace (#3009)



* reformat and update performance

* Update README.md

* Fix typos
Co-authored-by: default avatarMufei Li <mufeili1996@gmail.com>
parent 63aed582
...@@ -32,25 +32,47 @@ This example was implemented by [Hengrui Zhang](https://github.com/hengruizhang9 ...@@ -32,25 +32,47 @@ This example was implemented by [Hengrui Zhang](https://github.com/hengruizhang9
--dataname str The graph dataset name. Default is 'cora'. --dataname str The graph dataset name. Default is 'cora'.
--gpu int GPU index. Default is 0. --gpu int GPU index. Default is 0.
--split int Dataset spliting method. Default is 'random'. --split int Dataset spliting method. Default is 'random'.
--epochs int Number of training periods. Default is 500.
--lr float Learning rate. Default is 0.001.
--wd float Weight decay. Default is 1e-5.
--temp float Temperature. Default is 1.0.
--act_fn str Activation function. Default is relu.
--hid_dim int Hidden dimension. Default is 256.
--out_dim int Output dimension. Default is 256.
--num_layers int Number of GNN layers. Default is 2.
--der1 float Drop edge ratio 1. Default is 0.2.
--der2 float Drop edge ratio 2. Default is 0.2.
--dfr1 float Drop feature ratio 1. Default is 0.2.
--dfr2 float Drop feature ratio 2. Default is 0.2.
``` ```
## How to run examples ## How to run examples
In the paper(as well as authors' repo), the training set and testing set are split randomly with 1:9 ratio. In order to fairly compare it with other models with the split they used, termed public split, in this repo we also provide experiment results using public split. To run the examples, In the paper(as well as authors' repo), the training set and testing set are split randomly with 1:9 ratio. In order to fairly compare it with other methods with the public split (20 training nodes each class), in this repo we also provide its results using the public split (with fine-tuned hyper-parameters). To run the examples, follow the following instructions.
```python ```python
# Cora with random split # Cora with random split
python main.py --dataname cora python main.py --dataname cora --epochs 200 --lr 5e-4 --wd 1e-5 --hid_dim 128 --out_dim 128 --act_fn relu --der1 0.2 --der2 0.4 --dfr1 0.3 --dfr2 0.4 --temp 0.4
# Cora with public split # Cora with public split
python main.py --dataname cora --split public python main.py --dataname cora --split public --epochs 400 --lr 5e-4 --wd 1e-5 --hid_dim 256 --out_dim 256 --act_fn relu --der1 0.3 --der2 0.4 --dfr1 0.3 --dfr2 0.4 --temp 0.4
```
# Citeseer with random split
python main.py --dataname citeseer --epochs 200 --lr 1e-3 --wd 1e-5 --hid_dim 256 --out_dim 256 --act_fn prelu --der1 0.2 --der2 0.0 --dfr1 0.3 --dfr2 0.2 --temp 0.9
# Citeseer with public split
python main.py --dataname citeseer --split public --epochs 100 --lr 1e-3 --wd 1e-5 --hid_dim 512 --out_dim 512 --act_fn prelu --der1 0.3 --der2 0.3 --dfr1 0.3 --dfr2 0.3 --temp 0.4
replace 'cora' with 'citeseer' or 'pubmed' if you would like to run this example for other datasets. # Pubmed with random split
python main.py --dataname pubmed --epochs 1500 --lr 1e-3 --wd 1e-5 --hid_dim 256 --out_dim 256 --act_fn relu --der1 0.4 --der2 0.1 --dfr1 0.0 --dfr2 0.2 --temp 0.7
# Pubmed with public split
python main.py --dataname pubmed --split public --epochs 1500 --lr 1e-3 --wd 1e-5 --hid_dim 256 --out_dim 256 --act_fn relu --der1 0.4 --der2 0.1 --dfr1 0.0 --dfr2 0.2 --temp 0.7
```
## Performance ## Performance
We use the same hyperparameter settings as provided by the author, you can check config.yaml for detailed hyper-parameters for each dataset. For random split, we use the hyper-parameters as stated in the paper. For public split, we find the given hyper-parameters lead to poor performance, so we select the hyperparameters via a small grid search.
Random split (Train/Test = 1:9) Random split (Train/Test = 1:9)
...@@ -64,6 +86,6 @@ Public split ...@@ -64,6 +86,6 @@ Public split
| Dataset | Cora | Citeseer | Pubmed | | Dataset | Cora | Citeseer | Pubmed |
| :-----------: | :--: | :------: | :----: | | :-----------: | :--: | :------: | :----: |
| Author's Code | 79.9 | 68.6 | 81.3 | | Author's Code | 81.9 | 71.2 | 80.6 |
| DGL | 80.1 | 68.9 | 81.2 | | DGL | 82.2 | 71.4 | 80.2 |
cora:
learning_rate: 0.0005
num_hidden: 128
num_proj_hidden: 128
activation: 'relu'
num_layers: 2
drop_edge_rate_1: 0.2
drop_edge_rate_2: 0.4
drop_feature_rate_1: 0.3
drop_feature_rate_2: 0.4
tau: 0.4
num_epochs: 400
weight_decay: 0.00001
citeseer:
learning_rate: 0.001
num_hidden: 256
num_proj_hidden: 256
activation: 'prelu'
num_layers: 2
drop_edge_rate_1: 0.2
drop_edge_rate_2: 0.0
drop_feature_rate_1: 0.3
drop_feature_rate_2: 0.2
tau: 0.9
num_epochs: 200
weight_decay: 0.00001
pubmed:
learning_rate: 0.001
num_hidden: 256
num_proj_hidden: 256
activation: 'relu'
num_layers: 2
drop_edge_rate_1: 0.4
drop_edge_rate_2: 0.1
drop_feature_rate_1: 0.0
drop_feature_rate_2: 0.2
tau: 0.7
num_epochs: 1500
weight_decay: 0.00001
import argparse import argparse
from model import Grace from model import Grace
from aug import aug from aug import aug
from dataset import load from dataset import load
import numpy as np
import torch as th import torch as th
import torch.nn as nn import torch.nn as nn
import yaml
from yaml import SafeLoader
from eval import label_classification from eval import label_classification
import warnings import warnings
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
def count_parameters(model):
return sum([np.prod(p.size()) for p in model.parameters() if p.requires_grad])
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('--dataname', type=str, default='cora', choices = ['cora', 'citeseer', 'pubmed']) parser.add_argument('--dataname', type=str, default='cora')
parser.add_argument('--gpu', type=int, default=0) parser.add_argument('--gpu', type=int, default=0)
parser.add_argument('--split', type=str, default='random', choices = ['random', 'public']) parser.add_argument('--split', type=str, default='random')
parser.add_argument('--epochs', type=int, default=500, help='Number of training periods.')
parser.add_argument('--lr', type=float, default=0.001, help='Learning rate.')
parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay.')
parser.add_argument('--temp', type=float, default=1.0, help='Temperature.')
parser.add_argument('--act_fn', type=str, default='relu')
parser.add_argument("--hid_dim", type=int, default=256, help='Hidden layer dim.')
parser.add_argument("--out_dim", type=int, default=256, help='Output layer dim.')
parser.add_argument("--num_layers", type=int, default=2, help='Number of GNN layers.')
parser.add_argument('--der1', type=float, default=0.2, help='Drop edge ratio of the 1st augmentation.')
parser.add_argument('--der2', type=float, default=0.2, help='Drop edge ratio of the 2nd augmentation.')
parser.add_argument('--dfr1', type=float, default=0.2, help='Drop feature ratio of the 1st augmentation.')
parser.add_argument('--dfr2', type=float, default=0.2, help='Drop feature ratio of the 2nd augmentation.')
args = parser.parse_args() args = parser.parse_args()
if args.gpu != -1 and th.cuda.is_available(): if args.gpu != -1 and th.cuda.is_available():
...@@ -26,40 +45,37 @@ if args.gpu != -1 and th.cuda.is_available(): ...@@ -26,40 +45,37 @@ if args.gpu != -1 and th.cuda.is_available():
else: else:
args.device = 'cpu' args.device = 'cpu'
if __name__ == '__main__': if __name__ == '__main__':
# Step 1: Load hyperparameters =================================================================== # # Step 1: Load hyperparameters =================================================================== #
config = 'config.yaml' lr = args.lr
config = yaml.load(open(config), Loader=SafeLoader)[args.dataname] hid_dim = args.hid_dim
lr = config['learning_rate'] out_dim = args.out_dim
hid_dim = config['num_hidden']
out_dim = config['num_proj_hidden']
num_layers = config['num_layers'] num_layers = args.num_layers
act_fn = ({'relu': nn.ReLU(), 'prelu': nn.PReLU()})[config['activation']] act_fn = ({'relu': nn.ReLU(), 'prelu': nn.PReLU()})[args.act_fn]
drop_edge_rate_1 = config['drop_edge_rate_1'] drop_edge_rate_1 = args.der1
drop_edge_rate_2 = config['drop_edge_rate_2'] drop_edge_rate_2 = args.der2
drop_feature_rate_1 = config['drop_feature_rate_1'] drop_feature_rate_1 = args.dfr1
drop_feature_rate_2 = config['drop_feature_rate_2'] drop_feature_rate_2 = args.dfr2
temp = config['tau'] temp = args.temp
epochs = config['num_epochs'] epochs = args.epochs
wd = config['weight_decay'] wd = args.wd
# Step 2: Prepare data =================================================================== # # Step 2: Prepare data =================================================================== #
graph, feat, labels, train_mask, test_mask = load(args.dataname) graph, feat, labels, train_mask, test_mask = load(args.dataname)
in_dim = feat.shape[1] in_dim = feat.shape[1]
# Step 3: Create model =================================================================== # # Step 3: Create model =================================================================== #
model = Grace(in_dim, hid_dim, out_dim, num_layers, act_fn, temp) model = Grace(in_dim, hid_dim, out_dim, num_layers, act_fn, temp)
model = model.to(args.device) model = model.to(args.device)
print(f'# params: {count_parameters(model)}')
optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=wd) optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
# Step 4: Training ======================================================================= # # Step 4: Training =======================================================================
for epoch in range(epochs): for epoch in range(epochs):
model.train() model.train()
optimizer.zero_grad() optimizer.zero_grad()
...@@ -79,11 +95,12 @@ if __name__ == '__main__': ...@@ -79,11 +95,12 @@ if __name__ == '__main__':
print(f'Epoch={epoch:03d}, loss={loss.item():.4f}') print(f'Epoch={epoch:03d}, loss={loss.item():.4f}')
# Step 5: Linear evaluation ============================================================== # # Step 5: Linear evaluation ============================================================== #
print("=== Final Evaluation ===") print("=== Final ===")
graph = graph.add_self_loop() graph = graph.add_self_loop()
graph = graph.to(args.device) graph = graph.to(args.device)
feat = feat.to(args.device) feat = feat.to(args.device)
embeds = model.get_embedding(graph, feat) embeds = model.get_embedding(graph, feat)
'''Evaluation Embeddings ''' '''Evaluation Embeddings '''
label_classification(embeds, labels, train_mask, test_mask, split=args.split, ratio=0.1) label_classification(embeds, labels, train_mask, test_mask, split=args.split)
\ No newline at end of file \ No newline at end of file
...@@ -72,7 +72,6 @@ class Grace(nn.Module): ...@@ -72,7 +72,6 @@ class Grace(nn.Module):
def get_loss(self, z1, z2): def get_loss(self, z1, z2):
# calculate SimCLR loss # calculate SimCLR loss
f = lambda x: th.exp(x / self.temp) f = lambda x: th.exp(x / self.temp)
refl_sim = f(self.sim(z1, z1)) # intra-view pairs refl_sim = f(self.sim(z1, z1)) # intra-view pairs
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment