- 31 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 28 Jun, 2020 1 commit
-
-
Da Zheng authored
* add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * fix a bug. * add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * add Ips of the cluster. * fix exit. * support multiple clients. * balance node types and edges. * move code. * remove run.sh * Revert "support multiple clients." * fix. * update train_sampling. * fix. * fix * remove run.sh * update readme. * update readme. * use pytorch distributed. * ensure all trainers run the same number of steps. * Update README.md Co-authored-by:Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
-