Layer-Neighbor Sampling -- Defusing Neighborhood Explosion in GNNs ============ - Paper link: [https://arxiv.org/abs/2210.13339](https://arxiv.org/abs/2210.13339) This is the official Labor sampling example to reproduce the results in the original paper with the GraphSAGE GNN model. The model can be changed to any other model where NeighborSampler can be used. Requirements ------------ ```bash pip install requests lightning==2.0.6 ogb ``` How to run ------- ### Minibatch training for node classification Train w/ mini-batch sampling on the GPU for node classification on "ogbn-products" ```bash python3 train_lightning.py --dataset=ogbn-products ``` Results: ``` Test Accuracy: 0.797 ``` Any integer passed as the `--importance-sampling=i` argument runs the corresponding LABOR-i variant. `--importance-sampling=-1` runs the LABOR-* variant. `--vertex-limit` argument is used if a vertex sampling budget is needed. It adjusts the batch size at the end of every epoch so that the average number of sampled vertices converges to the provided vertex limit. Can be used to replicate the vertex sampling budget experiments in the Labor paper. During training runs, statistics about number of sampled vertices, edges, cache miss rates will be reported. One can use tensorboard to look at their plots during/after training: ```bash tensorboard --logdir tb_logs ``` ## Utilize a GPU feature cache for UVA training ```bash python3 train_lightning.py --dataset=ogbn-products --use-uva --cache-size=500000 ``` ## Reduce GPU feature cache miss rate for UVA training ```bash python3 train_lightning.py --dataset=ogbn-products --use-uva --cache-size=500000 --batch-dependency=64 ``` ## Force all layers to share the same neighborhood for shared vertices ```bash python3 train_lightning.py --dataset=ogbn-products --layer-dependency ```