# DeepWalk - Paper link: [here](https://arxiv.org/pdf/1403.6652.pdf) - Other implementation: [gensim](https://github.com/phanein/deepwalk), [deepwalk-c](https://github.com/xgfs/deepwalk-c) The implementation includes multi-processing training with CPU and mixed training with CPU and multi-GPU. ## Dependencies - PyTorch 1.0.1+ ## Tested version - PyTorch 1.5.0 - DGL 0.4.3 ## How to run the code Format of a network file: ``` 1(node id) 2(node id) 1 3 ... ``` To run the code: ``` python3 deepwalk.py --net_file net.txt --emb_file emb.txt --adam --mix --lr 0.2 --num_procs 4 --batch_size 100 --negative 5 ``` ## How to save the embedding Functions: ``` SkipGramModel.save_embedding(dataset, file_name) SkipGramModel.save_embedding_txt(dataset, file_name) ``` ## Evaluation To evalutate embedding on multi-label classification, please refer to [here](https://github.com/ShawXh/Evaluate-Embedding) YouTube (1M nodes). | Implementation | Macro-F1 (%)
1%    3%    5%    7%    9% | Micro-F1 (%)
1%    3%    5%    7%    9% | |----|----|----| | gensim.word2vec(hs) | 28.73   32.51   33.67   34.28   34.79 | 35.73   38.34   39.37   40.08   40.77 | | gensim.word2vec(ns) | 28.18   32.25   33.56   34.60   35.22 | 35.35   37.69   38.08   40.24   41.09 | | ours | 24.58   31.23   33.97   35.41   36.48 | 38.93   43.17   44.73   45.42   45.92 | The comparison between running time is shown as below, where the numbers in the brackets denote time used on random-walk. | Implementation | gensim.word2vec(hs) | gensim.word2vec(ns) | Ours | |----|----|----|----| | Time (s) | 27119.6(1759.8) | 10580.3(1704.3) | 428.89 | Parameters. - walk_length = 80, number_walks = 10, window_size = 5 - Ours: 4GPU (Tesla V100), lr = 0.2, batchs_size = 128, neg_weight = 5, negative = 1, num_thread = 4 - Others: workers = 8, negative = 5 Speeding-up with mixed CPU & multi-GPU. The used parameters are the same as above. | #GPUs | 1 | 2 | 4 | |----------|-------|-------|-------| | Time (s) |1419.64| 952.04|428.89 |