# GraphSAINT This DGL example implements the paper: GraphSAINT: Graph Sampling Based Inductive Learning Method. Paper link: https://arxiv.org/abs/1907.04931 Author's code: https://github.com/GraphSAINT/GraphSAINT Contributor: Liu Tang ([@lt610](https://github.com/lt610)) ## Dependencies - Python 3.7.0 - PyTorch 1.6.0 - NumPy 1.19.2 - Scikit-learn 0.23.2 - DGL 0.5.3 ## Dataset All datasets used are provided by Author's [code](https://github.com/GraphSAINT/GraphSAINT). They are available in [Google Drive](https://drive.google.com/drive/folders/1zycmmDES39zVlbVCYs88JTJ1Wm5FbfLz) (alternatively, [Baidu Wangpan (code: f1ao)](https://pan.baidu.com/s/1SOb0SiSAXavwAcNqkttwcg#list/path=%2F)). Once you download the datasets, you need to rename graphsaintdata to data. Dataset summary("m" stands for multi-label classification, and "s" for single-label.): | Dataset | Nodes | Edges | Degree | Feature | Classes | Train/Val/Test | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | | PPI | 14,755 | 225,270 | 15 | 50 | 121(m) | 0.66/0.12/0.22 | | Flickr | 89,250 | 899,756 | 10 | 500 | 7(s) | 0.50/0.25/0.25 | ## Minibatch training Run with following: ```bash python train_sampling.py --gpu 0 --dataset ppi --sampler node --node-budget 6000 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0 python train_sampling.py --gpu 0 --dataset ppi --sampler edge --edge-budget 4000 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0 --dropout 0.1 python train_sampling.py --gpu 0 --dataset ppi --sampler rw --num-roots 3000 --length 2 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0 --dropout 0.1 python train_sampling.py --gpu 0 --dataset flickr --sampler node --node-budget 8000 --num-repeat 25 --n-epochs 30 --n-hidden 256 --arch 1-1-0 --dropout 0.2 python train_sampling.py --gpu 0 --dataset flickr --sampler edge --edge-budget 6000 --num-repeat 25 --n-epochs 15 --n-hidden 256 --arch 1-1-0 --dropout 0.2 python train_sampling.py --gpu 0 --dataset flickr --sampler rw --num-roots 6000 --length 2 --num-repeat 25 --n-epochs 15 --n-hidden 256 --arch 1-1-0 --dropout 0.2 ``` ## Comparison * Paper: results from the paper * Running: results from experiments with the authors' code * DGL: results from experiments with the DGL example ### F1-micro #### Random node sampler | Method | PPI | Flickr | | --- | --- | --- | | Paper | 0.960±0.001 | 0.507±0.001 | | Running | 0.9628 | 0.5077 | | DGL | 0.9618 | 0.4828 | #### Random edge sampler | Method | PPI | Flickr | | --- | --- | --- | | Paper | 0.981±0.007 | 0.510±0.002 | | Running | 0.9810 | 0.5066 | | DGL | 0.9818 | 0.5054 | #### Random walk sampler | Method | PPI | Flickr | | --- | --- | --- | | Paper | 0.981±0.004 | 0.511±0.001 | | Running | 0.9812 | 0.5104 | | DGL | 0.9818 | 0.5018 | ### Sampling time #### Random node sampler | Method | PPI | Flickr | | --- | --- | --- | | Sampling(Running) | 0.77 | 0.65 | | Sampling(DGL) | 0.24 | 0.57 | | Normalization(Running) | 0.69 | 2.84 | | Normalization(DGL) | 1.04 | 0.41 | #### Random edge sampler | Method | PPI | Flickr | | --- | --- | --- | | Sampling(Running) | 0.72 | 0.56 | | Sampling(DGL) | 0.50 | 0.72 | | Normalization(Running) | 0.68 | 2.62 | | Normalization(DGL) | 0.61 | 0.38 | #### Random walk sampler | Method | PPI | Flickr | | --- | --- | --- | | Sampling(Running) | 0.83 | 1.22 | | Sampling(DGL) | 0.28 | 0.63 | | Normalization(Running) | 0.87 | 2.60 | | Normalization(DGL) | 0.70 | 0.42 |