README.md 3.38 KB
Newer Older
lt610's avatar
lt610 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# GraphSAINT

This DGL example implements the paper: GraphSAINT: Graph Sampling Based Inductive Learning Method.

Paper link: https://arxiv.org/abs/1907.04931

Author's code: https://github.com/GraphSAINT/GraphSAINT

Contributor: Liu Tang ([@lt610](https://github.com/lt610))

## Dependencies

- Python 3.7.0
- PyTorch 1.6.0
- NumPy 1.19.2
- Scikit-learn 0.23.2
- DGL 0.5.3

## Dataset

All datasets used are provided by Author's [code](https://github.com/GraphSAINT/GraphSAINT). They are available in [Google Drive](https://drive.google.com/drive/folders/1zycmmDES39zVlbVCYs88JTJ1Wm5FbfLz) (alternatively, [Baidu Wangpan (code: f1ao)](https://pan.baidu.com/s/1SOb0SiSAXavwAcNqkttwcg#list/path=%2F)). Once you download the datasets, you need to rename graphsaintdata to data. Dataset summary("m" stands for multi-label classification, and "s" for single-label.):
| Dataset | Nodes | Edges | Degree | Feature | Classes | Train/Val/Test |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| PPI | 14,755 | 225,270 | 15 | 50 | 121(m) | 0.66/0.12/0.22 |
| Flickr | 89,250 | 899,756 | 10 | 500 | 7(s) | 0.50/0.25/0.25 |

## Minibatch training

Run with following:
```bash
python train_sampling.py --gpu 0 --dataset ppi --sampler node --node-budget 6000 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0
python train_sampling.py --gpu 0 --dataset ppi --sampler edge --edge-budget 4000 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0 --dropout 0.1
python train_sampling.py --gpu 0 --dataset ppi --sampler rw --num-roots 3000 --length 2 --num-repeat 50 --n-epochs 1000 --n-hidden 512 --arch 1-0-1-0 --dropout 0.1
python train_sampling.py --gpu 0 --dataset flickr --sampler node --node-budget 8000 --num-repeat 25 --n-epochs 30 --n-hidden 256 --arch 1-1-0 --dropout 0.2
python train_sampling.py --gpu 0 --dataset flickr --sampler edge --edge-budget 6000 --num-repeat 25 --n-epochs 15 --n-hidden 256 --arch 1-1-0 --dropout 0.2
python train_sampling.py --gpu 0 --dataset flickr --sampler rw --num-roots 6000 --length 2 --num-repeat 25 --n-epochs 15 --n-hidden 256 --arch 1-1-0 --dropout 0.2
```

## Comparison

* Paper: results from the paper
* Running: results from experiments with the authors' code
* DGL: results from experiments with the DGL example

### F1-micro

#### Random node sampler

| Method | PPI | Flickr |
| --- | --- | --- |
| Paper | 0.960±0.001 | 0.507±0.001 |
| Running | 0.9628 | 0.5077 |
| DGL | 0.9618 | 0.4828 |

#### Random edge sampler

| Method | PPI | Flickr |
| --- | --- | --- |
| Paper | 0.981±0.007 | 0.510±0.002 |
| Running | 0.9810 | 0.5066 |
| DGL | 0.9818 | 0.5054 |

#### Random walk sampler
| Method | PPI | Flickr |
| --- | --- | --- |
| Paper | 0.981±0.004 | 0.511±0.001 |
| Running | 0.9812 | 0.5104 |
| DGL | 0.9818 | 0.5018 |

### Sampling time

#### Random node sampler

| Method | PPI | Flickr |
| --- | --- | --- |
| Sampling(Running) | 0.77 | 0.65 |
| Sampling(DGL) | 0.24 | 0.57 |
| Normalization(Running) | 0.69 | 2.84 |
| Normalization(DGL) | 1.04 | 0.41 |

#### Random edge sampler

| Method | PPI | Flickr |
| --- | --- | --- |
| Sampling(Running) | 0.72 | 0.56 |
| Sampling(DGL) | 0.50 | 0.72 |
| Normalization(Running) | 0.68 | 2.62 |
| Normalization(DGL) | 0.61 | 0.38 |

#### Random walk sampler

| Method | PPI | Flickr |
| --- | --- | --- |
| Sampling(Running) | 0.83 | 1.22 |
| Sampling(DGL) | 0.28 | 0.63 |
| Normalization(Running) | 0.87 | 2.60 |
| Normalization(DGL) | 0.70 | 0.42 |