1. 03 Aug, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] Support multiple servers (#1886) · a4c931a9
      Da Zheng authored
      
      
      * client init graph on the backup servers.
      
      * fix.
      
      * test multi-server.
      
      * fix anonymous dist tensors.
      
      * check #parts.
      
      * fix init_data
      
      * add multi-server multi-client tests.
      
      * update tests in kvstore.
      
      * fix.
      
      * verify the loaded partition.
      
      * fix a bug.
      
      * fix lint.
      
      * fix.
      
      * fix example.
      
      * fix rpc.
      
      * fix pull/push handler for backup kvstore
      
      * fix example readme.
      
      * change ip.
      
      * update docstring.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
      a4c931a9
  2. 01 Aug, 2020 1 commit
  3. 31 Jul, 2020 1 commit
  4. 27 Jul, 2020 1 commit
  5. 16 Jul, 2020 1 commit
    • Chao Ma's avatar
      [Distributed] Distributed launching script (#1772) · ca9d3216
      Chao Ma authored
      
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * fix launch script.
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      ca9d3216
  6. 15 Jul, 2020 1 commit
  7. 14 Jul, 2020 1 commit
  8. 28 Jun, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] Pytorch example of distributed GraphSage. (#1495) · 02d31974
      Da Zheng authored
      
      
      * add train_dist.
      
      * Fix sampling example.
      
      * use distributed sampler.
      
      * fix a bug in DistTensor.
      
      * fix distributed training example.
      
      * add graph partition.
      
      * add command
      
      * disable pytorch parallel.
      
      * shutdown correctly.
      
      * load diff graphs.
      
      * add ip_config.txt.
      
      * record timing for each step.
      
      * use ogb
      
      * add profiler.
      
      * fix a bug.
      
      * add train_dist.
      
      * Fix sampling example.
      
      * use distributed sampler.
      
      * fix a bug in DistTensor.
      
      * fix distributed training example.
      
      * add graph partition.
      
      * add command
      
      * disable pytorch parallel.
      
      * shutdown correctly.
      
      * load diff graphs.
      
      * add ip_config.txt.
      
      * record timing for each step.
      
      * use ogb
      
      * add profiler.
      
      * add Ips of the cluster.
      
      * fix exit.
      
      * support multiple clients.
      
      * balance node types and edges.
      
      * move code.
      
      * remove run.sh
      
      * Revert "support multiple clients."
      
      * fix.
      
      * update train_sampling.
      
      * fix.
      
      * fix
      
      * remove run.sh
      
      * update readme.
      
      * update readme.
      
      * use pytorch distributed.
      
      * ensure all trainers run the same number of steps.
      
      * Update README.md
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
      02d31974