- 09 Feb, 2022 1 commit
-
-
Rhett Ying authored
* enable to launch multiple client groups sequentially * launch simultaneously is enabled * refine docstring * revert unnecessary change * [DOC] add doc for long live server * refine * refine doc * refine doc
-
- 08 Nov, 2021 1 commit
-
-
Rhett Ying authored
Remove self-loops and duplicate edges before ParMETIS and restore when converting to DGLGraph (#3472) * save self-loops and duplicated edges separately. * [BugFix] sort graph by dgl.ETYPE * fix bugs in verify script * fix verify logic * refine README Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 23 Sep, 2021 1 commit
-
-
xiang song(charlie.song) authored
[Distributed] Allow user to pass-in extra env parameters when launching a distributed training task. (#3375) * Allow user to pass-in extra env parameters when launching a distributed training task. * Update * upd Co-authored-by:xiangsx <xiangsx@ip-10-3-59-214.eu-west-1.compute.internal>
-
- 14 Sep, 2021 1 commit
-
-
xiang song(charlie.song) authored
* put PYTHONPATH in server launch * remove prints Co-authored-by:xiangsx <xiangsx@ip-10-3-59-214.eu-west-1.compute.internal>
-
- 17 Aug, 2021 1 commit
-
-
Eric Kim authored
[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command (#3245) * Correctly pass all DGL client/server env vars if udf is a multi-command * Refactor to use wrap_cmd_with_local_envvars() helper fn
-
- 02 Aug, 2021 2 commits
-
-
Ankit Garg authored
* Added code for Rectifying (TypeError: unhashable type: 'slice') when copying file * 1) added distributed preprocessing code to create ParMetis Input from CSV files 2) add code to run pm_dglpart on multiple machines 3) added support for recreating heteregenous graph from homo geneous graph based on dropped edges, as ParMetis currently only supports homogeneous graphs * move to pandas * Added comments and remove drop_duplicates as it was redundant * Addressed Pr Comments * Rename variable * Added comment * Added comment * updated ReadMe Co-authored-by:
Ankit Garg <gaank@amazon.com> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
Eric Kim authored
* Refactors torch dist launcher udf-wrap code to handle more python versions * minor changes
-
- 30 Jul, 2021 1 commit
-
-
Eric Kim authored
-
- 02 Jul, 2021 1 commit
-
-
ankit-garg authored
Co-authored-by:
Ankit Garg <gaank@amazon.com> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 26 May, 2021 1 commit
-
-
Da Zheng authored
* explicitly set the graph format. * fix. * fix. * fix launch script. * fix readme. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 01 May, 2021 1 commit
-
-
Da Zheng authored
* kill training jobs. * update. * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-73-81.ec2.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 08 Apr, 2021 1 commit
-
-
Da Zheng authored
Co-authored-by:
Ubuntu <ubuntu@ip-172-31-73-81.ec2.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 04 Apr, 2021 1 commit
-
-
Da Zheng authored
* set omp thread. * add comment. * fix.
-
- 30 Mar, 2021 1 commit
-
-
Da Zheng authored
* remove num_workers. * remove num_workers. * remove num_workers. * remove num-servers. * update error message. * update docstring. * fix docs. * fix tests. * fix test. * fix. * print messages in test. * fix. * fix test. * fix. Co-authored-by:Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
-
- 22 Mar, 2021 1 commit
-
-
Da Zheng authored
Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 25 Feb, 2021 1 commit
-
-
Da Zheng authored
Co-authored-by:Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
-
- 09 Feb, 2021 1 commit
-
-
Da Zheng authored
* add convert. * fix. * add write_mag. * fix convert_partition.py * write data. * use pyarrow to read. * update write_mag.py * fix convert_partition.py. * load node/edge features when necessary. * reshuffle nodes. * write mag correctly. * fix a bug: inner nodes in a partition might be empty. * fix bugs. * add verify code. * insert reverse edges. * fix a bug. * add get node/edge data. * add instructions. * remove unnecessary argument. * update distributed preprocessing. * fix readme. * fix. * fix. * fix. * fix readme. * fix doc. * fix. * update readme * update doc. * update readme. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
-
- 15 Sep, 2020 1 commit
-
-
Chao Ma authored
* update * update
-
- 27 Aug, 2020 1 commit
-
-
Chao Ma authored
* check num_workers * update * update * update * update * update * update
-
- 13 Aug, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update * update * update * update * update * update * update * update * update * update * update
-
- 12 Aug, 2020 2 commits
- 11 Aug, 2020 2 commits
-
-
Da Zheng authored
* move server start code to initialize. * fix. * fix lint. * fix examples. * add more checks.
-
Chao Ma authored
* remove server_count from ip_config.txt * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * lint * update * update * update * update * update * update * update * update * update * update * update * update * update * Update dist_context.py * fix lint. * make it work for multiple spaces. * update ip_config.txt. * fix examples. * update * update * update * update * update * update * update * update * update * update * update * update * update * update * udpate * update * update * update * update * update Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 10 Aug, 2020 1 commit
-
-
Da Zheng authored
* fix tests. * fix. * remove a test. * make code work in the standalone mode. * fix example. * more fix. * make DistDataloader work with num_workers=0 * fix DistDataloader tests. * fix. * fix lint. * fix cleanup. * fix test * remove unnecessary code. * remove tests. * fix. * fix. * fix. * fix example * fix. * fix. * fix launch script. Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 09 Aug, 2020 1 commit
-
-
Da Zheng authored
* temp fix omp. * set server threads. * add CAPI to set up OMP threads. * fix. * fix. * update namesapce. * set cpi properly. * allow to config num worker threads. * set #threads. * fix.
-
- 08 Aug, 2020 1 commit
-
-
Da Zheng authored
* update launch script * check the correctness of launch script. * fix.
-
- 31 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 27 Jul, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update
-
- 17 Jul, 2020 1 commit
-
-
Da Zheng authored
Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 16 Jul, 2020 1 commit
-
-
Chao Ma authored
* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * fix launch script. Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 03 May, 2020 1 commit
-
-
Da Zheng authored
* initial version from distributed training. This is copied from multiprocessing training. * modify for distributed training. * it's runnable now. * measure time in neighbor sampling. * simplify neighbor sampling. * fix a bug in distributed neighbor sampling. * allow single-machine training. * fix a bug. * fix a bug. * fix openmp. * make some improvement. * fix. * add prepare in the sampler. * prepare nodeflow async. * fix a bug. * get id. * simplify the code. * improve. * fix partition.py * fix the example. * add more features. * fix the example. * allow one partition * use distributed kvstore. * do g2l map manually. * fix commandline. * a temp script to save reddit. * fix pull_handler. * add pytorch version. * estimate the time for copying data. * delete unused code. * fix a bug. * print id. * fix a bug * fix a bug * fix a bug. * remove redundent code. * revert modify in sampler. * fix temp script. * remove pytorch version. * fix. * distributed training with pytorch. * add distributed graph store. * fix. * add metis_partition_assignment. * fix a few bugs in distributed graph store. * fix test. * fix bugs in distributed graph store. * fix tests. * remove code of defining DistGraphStore. * fix partition. * fix example. * update run.sh. * only read necessary node data. * batching data fetch of multiple NodeFlows. * simplify gcn. * remove unnecessary code. * use the new copy_from_kvstore. * update training script. * print time in graphsage. * make distributed training runnable. * use val_nid. * fix train_sampling. * add distributed training. * add run.sh * add more timing. * fix a bug. * save graph metadata when partition. * create ndata and edata in distributed graph store. * add timing in minibatch training of GraphSage. * use pytorch distributed. * add checks. * fix a bug in global vs. local ids. * remove fast pull * fix a compile error. * update and add new APIs. * implement more methods in DistGraphStore. * update more APIs. * rename it to DistGraph. * rename to DistTensor * remove some unnecessary API. * remove unnecessary files. * revert changes in sampler. * Revert "simplify gcn." This reverts commit 0ed3a34ca714203a5b45240af71555d4227ce452. * Revert "simplify neighbor sampling." This reverts commit 551c72d20f05a029360ba97f312c7a7a578aacec. * Revert "measure time in neighbor sampling." This reverts commit 63ae80c7b402bb626e24acbbc8fdfe9fffd0bc64. * Revert "add timing in minibatch training of GraphSage." This reverts commit e59dc8957a414c7df5c316f51d78bce822bdef5e. * Revert "fix train_sampling." This reverts commit ea6aea9a4aabb8ba0ff63070aa51e7ca81536ad9. * fix lint. * add comments and small update. * add more comments. * add more unit tests and fix bugs. * check the existence of shared-mem graph index. * use new partitioned graph storage. * fix bugs. * print error in fast pull. * fix lint * fix a compile error. * save absolute path after partitioning. * small fixes in the example * Revert "[kvstore] support any data type for init_data() (#1465)" This reverts commit 87b6997b . * fix a bug. * disable evaluation. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * support set and init data. * support set and init data. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * fix bugs. * fix unit test. * move to dgl.distributed. * fix lint. * fix lint. * remove local_nids. * fix lint. * fix test. * remove train_dist. * revert train_sampling. * rename funcs. * address comments. * address comments. Use NodeDataView/EdgeDataView to keep track of data. * address comments. * address comments. * revert. * save data with DGL serializer. * use the right way of getting shape. * fix lint. * address comments. * address comments. * fix an error in mxnet. * address comments. * add edge_map. * add more test and fix bugs. Co-authored-by:
Zheng <dzzhen@186590dc80ff.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-6-131.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-150.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-30-135.us-west-2.compute.internal>
-
- 08 Mar, 2020 1 commit
-
-
Da Zheng authored
* add metis. * add test. * construct partition id. * link to METIS github repo. * update metis. * add a tool for partitioning a graph. * update metis. * update. * update. * fix metis. * fix lint * fix indent. * another way of building metis. * disable metis in windows. * test windows * fix. * disable metis for windows properly. * fix for tensorflow. * skip test for gpu. * make graph symmetric * address comments. * more comments. * fix compile * fix a bug. * add test. * change the default #hops of HALO nodes. Co-authored-by:Ubuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal>
-