- 25 Jan, 2021 1 commit
-
-
Da Zheng authored
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler....
-
- 02 Sep, 2020 1 commit
-
-
Chao Ma authored
-
- 25 Aug, 2020 1 commit
-
-
Chao Ma authored
* fix issues on GPU * update * update * update * update * update * update * update * update * update Co-authored-by:Ma <manchao@38f9d3587685.ant.amazon.com>
-
- 13 Aug, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update * update * update * update * update * update * update * update * update * update * update
-
- 12 Aug, 2020 2 commits
- 11 Aug, 2020 1 commit
-
-
Chao Ma authored
* remove server_count from ip_config.txt * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * lint * update * update * update * update * update * update * update * update * update * update * update * update * update * Update dist_context.py * fix lint. * make it work for multiple spaces. * update ip_config.txt. * fix examples. * update * update * update * update * update * update * update * update * update * update * update * update * update * update * udpate * update * update * update * update * update Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 10 Aug, 2020 1 commit
-
-
Da Zheng authored
* fix tests. * fix. * remove a test. * make code work in the standalone mode. * fix example. * more fix. * make DistDataloader work with num_workers=0 * fix DistDataloader tests. * fix. * fix lint. * fix cleanup. * fix test * remove unnecessary code. * remove tests. * fix. * fix. * fix. * fix example * fix. * fix. * fix launch script. Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 03 Aug, 2020 1 commit
-
-
Da Zheng authored
* client init graph on the backup servers. * fix. * test multi-server. * fix anonymous dist tensors. * check #parts. * fix init_data * add multi-server multi-client tests. * update tests in kvstore. * fix. * verify the loaded partition. * fix a bug. * fix lint. * fix. * fix example. * fix rpc. * fix pull/push handler for backup kvstore * fix example readme. * change ip. * update docstring. Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 01 Aug, 2020 1 commit
-
-
xiang song(charlie.song) authored
* Standalone can run * fix * Add save * Fix * Fix * Fix * Fix * debug * test * test * Fix * Fix * log * Fix * fix * Profile * auto sync grad * update * add test for unsupervised dist training * upd * Fix lr * Fix update * sync * fix * Revert "fix" This reverts commit d5caa7398b36125f6d6e2c742a95c6ff4298c9e9. * Fix * unsupervised * Fix * remove debug * Add test case for dist_graph find_edges() * Fix * skip tensorflow test for find_edges * Update readme * remove some test * upd * Update partition_graph.py Co-authored-by:
Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 31 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 27 Jul, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update
-
- 16 Jul, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * fix launch script. Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 15 Jul, 2020 1 commit
-
-
Da Zheng authored
* add standalone mode * add comments. * add tests for sampling. * fix. * make the code to run the standalone mode * fix * fix * fix readme. * fix. * fix test Co-authored-by:Chao Ma <mctt90@gmail.com>
-
- 14 Jul, 2020 1 commit
-
-
Da Zheng authored
* run dist server in dgl. * fix bugs. * fix example. * check environment variables and fix lint. * fix lint
-
- 28 Jun, 2020 1 commit
-
-
Da Zheng authored
* add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * fix a bug. * add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * add Ips of the cluster. * fix exit. * support multiple clients. * balance node types and edges. * move code. * remove run.sh * Revert "support multiple clients." * fix. * update train_sampling. * fix. * fix * remove run.sh * update readme. * update readme. * use pytorch distributed. * ensure all trainers run the same number of steps. * Update README.md Co-authored-by:Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
-