- 11 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable TP::Receiver wait for any numbers of senders * fix random unit test failure * avoid endless future wait * fix unit test failure * fix seg fault when finalize wait in receiver * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests * fix lint * release RPCContext resources before process exits * [Debug] TPReceiver wait start log * [Debug] add log in get port * [Debug] add log * [ReDebug] revert time sleep in unit tests * [Debug] remove sleep for test_distri,test_mp * [debug] add more log * [debug] add listen_booted_ flag * [debug] restore commented code for queue * [debug] sleep more in rpc_client * restore change in tests * Revert "restore change in tests" This reverts commit 41a18926d181ec2517069389bfc41de2cc949280. * Revert "[debug] sleep more in rpc_client" This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67. * Revert "[debug] restore commented code for queue" This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301. * Revert "[debug] add listen_booted_ flag" This reverts commit 244b2167d94942ff2a0acec8823b974975e52580. * Revert "[debug] add more log" This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2. * Revert "[Debug] remove sleep for test_distri,test_mp" This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612. * remove debug code * revert unnecessary change * revert unnecessary changes * always reset RPCContext when get started and reset all data * remove time.sleep in dist tests * fix lint * reset envs before each dist test * reset env properly * add time sleep when start each server * sleep for a while when boot server * replace wait_thread with callback * fix lint * add dglconnect handshake check Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 06 Dec, 2021 2 commits
-
-
Jinjing Zhou authored
* doesn't know whether works * add change * fix * fix * fix * remove * revert * lint * lint * fix * revert * lint * fix * only build rpc on linux * lint * lint * fix build on windows * fix windows * remove old test * fix cmake * Revert "remove old test" This reverts commit f1ea75c777c34cdc1f08c0589676ba6aee1feb29. * fix windows * fix * fix * fix indent * fix indent * address comment * fix * fix * fix * fix * fix * lint * fix indent * fix lint * add introduction * fix * lint * lint * add more logs * fix * update xbyak for C++14 with gcc5 * Remove channels * fix * add test script * fix * remove unused file * fix lint * add timeout
-
Quan (Andy) Gan authored
* first commit * second commit * spaghetti unit tests * rewrite test
-
- 12 Oct, 2021 1 commit
-
-
Rhett Ying authored
-
- 01 Sep, 2021 1 commit
-
-
xiang song(charlie.song) authored
[Feature] Add a HINT for the per edge type sampler of heterogeneous DistGraph that highlighting the etypes are sorted already. (#3260) * pass cpp test * distgraph use sorted edge flag. * lint * triger * update test Co-authored-by:Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 29 Aug, 2021 1 commit
-
-
Da Zheng authored
* handle empty frontiers. * fix lint. * fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 06 Aug, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Fix dist negative data loader bug * upd * Fix Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 28 Jul, 2021 2 commits
-
-
Da Zheng authored
* make heterogeneous find_edges * add distributed EdgeDataLoader. * fix. * fix a bug. * fix bugs. * add tests on distributed heterogeneous graph sampling. * fix. Co-authored-by:Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * Basic imple of heterogeneous on homogenenous sampling * make pass * Pass C++ test * Add python test code * lint * lint * Add MultiLayerEtypeNeighborSampler * Add unitest for single machine dataloader * Add dist dataloader test for edge type sampler * Fix lint * fix * support for per etype sample * Fix some bug and enable distributed training with per edge sample * fix * Now distributed training works * turn off some mxnet * turn off mxnet for some dist test * fix * upd * upd according to the comments * Fix * Fix test and now distributed works. * upd * upd * Fix * Fix bug * remove dead code. * upd * Fix * upd * Fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 17 Jul, 2021 1 commit
-
-
Da Zheng authored
* support hetero RGCN. * fix. * simplify code. * sample_neighbors return heterograph directly. * avoid using to_heterogeneous. * compute canonical etypes in advance. * fix tests. * fix. * fix distributed data loader for heterograph. * use NodeDataLoader. * fix bugs in partitioning on heterogeneous graphs. * fix lint. * fix tests. * fix. * fix. * fix bugs. * fix tests. * fix. * enable coo for distributed. * fix. * fix. * fix. * fix. * fix. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
- 13 Jul, 2021 1 commit
-
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * update doc Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 05 Jul, 2021 1 commit
-
-
Da Zheng authored
* fix. * fix. * fix. * fix. * Fix test Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 02 Jul, 2021 2 commits
-
-
Da Zheng authored
* fix bugs in partitioning on heterogeneous graphs. * fix. * fix. * fix example. * fix. * fix test. * fix. * fix. * fix. * fix tests. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
Jinjing Zhou authored
* try enable kvstore test * fix * fix * seperate out kvstore test * add comment
-
- 25 Jun, 2021 1 commit
-
-
Quan (Andy) Gan authored
* csr and csc creation * fix * fix * fixes to adj transpose * fine * raise error if indptr did not match number of nodes * fix * huh? * oh Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 16 Jun, 2021 1 commit
-
-
Da Zheng authored
* add. * fix. * fix. * fix. * fix. * add tests. * support node split and edge split. * support 1 partition. * add tests. * fix. * fix test. * use hierarchical partition. * add check. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-22-57.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 26 May, 2021 1 commit
-
-
Da Zheng authored
* explicitly set the graph format. * fix. * fix. * fix launch script. * fix readme. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 18 May, 2021 1 commit
-
-
Da Zheng authored
* add distributed in-degree and out-degree. * update comments. * fix a bug. * add tests. * add tests. * fix a bug. * fix docstring. * update doc. * fix * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 03 May, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Draft for sparse emb * add some notes * Fix * Add sparse optim for dist pytorch * Update test * Fix * upd * upd * Fix * Fix * Fix bug * add transductive exmpale * Fix example * Some fix * Upd * Fix lint * lint * lint * lint * upd * Fix lint * lint * upd * remove dead import * update * lint * update unitest * update example * Add adam optimizer * Add unitest and update data * upd * upd * upd * Fix docstring and fix some bug in example code * Update rgcn readme Co-authored-by:
Ubuntu <ubuntu@ip-172-31-57-25.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 26 Apr, 2021 1 commit
-
-
Da Zheng authored
* update distributed training doc. * explain data split. * fix message passing. * id mapping. * fix. * test data reshuffling. * fix a bug. * fix test. * Revert "fix." This reverts commit 2d025e9e1a5c05c3da9b803a035a788ced59bd77. * Revert "id mapping." This reverts commit 2a6a93ceb81fbdff86e6e9e5a58e1ace1e9d9882. * Revert "fix message passing." This reverts commit ed8a86bf2b015e5e4f64ba160e81b207ad2a1d65. * Revert "explain data split." This reverts commit 4338ddf8a336014cf92d4cb9a1db02b9badc0e55. * Revert "update distributed training doc." This reverts commit dda1c35c44536934c19715534f01f832afda6ad2. * add more tests. * fix. * fix. * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 22 Apr, 2021 1 commit
-
-
Da Zheng authored
* return mapping. * support heterogeneous graph. * more test. * fix lint. * fix for diff backends. * fix. * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 13 Apr, 2021 1 commit
-
-
Da Zheng authored
* fix. * test distributed graph without node/edge data. * remove some tests. * fix lint
-
- 01 Apr, 2021 1 commit
-
-
Minjie Wang authored
-
- 30 Mar, 2021 1 commit
-
-
Da Zheng authored
* remove num_workers. * remove num_workers. * remove num_workers. * remove num-servers. * update error message. * update docstring. * fix docs. * fix tests. * fix test. * fix. * print messages in test. * fix. * fix test. * fix. Co-authored-by:Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
-
- 25 Jan, 2021 1 commit
-
-
Da Zheng authored
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>
-
- 18 Aug, 2020 1 commit
-
-
Da Zheng authored
* add doc. * update DistGraph. * add DistTensor. * update DistEmbedding. * add partition.py * add sampling. * fix. * add graph partition book and create a base class. * fix test. * add rst. * update doc rst. * update. * fix. * fix docs * update distributed tensor and embeddings. * add checks. * update DistGraph. * update initialization. * fix graph partition book. * update graph partition book. * update partition. * update partition. * fix. * add example code. * update DistGraph * Update python/dgl/distributed/dist_context.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_context.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * update initialize. * update dataloader. * update distgraph. * update DistGraph. * update DistTensor. * update. * more updates. * fix lint. * add num_nodes and num_edges Co-authored-by:
Chao Ma <mctt90@gmail.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 17 Aug, 2020 1 commit
-
-
Mufei Li authored
* Update graph * Fix for dgl.graph * from_scipy * Replace canonical_etypes with relations * from_networkx * Update for hetero_from_relations * Roll back the change of canonical_etypes to relations * heterograph * bipartite * Update doc * Fix lint * Fix lint * Fix test cases * Fix * Fix * Fix * Fix * Fix * Fix * Update * Fix test * Fix * Update * Use DGLError * Update * Update * Update * Update * Fix * Fix * Fix * Fix * Fix * Fix * Fix * Fix * Update * Fix * Update * Fix * Fix * Fix * Update * Fix * Update * Fix * Update * Update * Update * Update * Update * Update * Update * Fix * Fix * Update * Update * Update * Update * Update * Update * rewrite sanity checks * delete unnecessary checks * Update * Update * Update * Update * Update * Update * Update * Update * Fix * Update * Update * Update * Fix * Fix * Fix * Update * Fix * Update * Fix * Fix * Update * Fix * Update * Fix Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com>
-
- 14 Aug, 2020 2 commits
- 12 Aug, 2020 1 commit
-
-
Da Zheng authored
* rename get_data_size. * remove g from DistTensor. * remove g from DistEmbedding. * clean up API of graph partition book. * fix DistGraph * fix lint. * collect all part policies. * fix. * fix. * support distributed sampler. * remove partition.py
-
- 11 Aug, 2020 1 commit
-
-
Chao Ma authored
* remove server_count from ip_config.txt * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * lint * update * update * update * update * update * update * update * update * update * update * update * update * update * Update dist_context.py * fix lint. * make it work for multiple spaces. * update ip_config.txt. * fix examples. * update * update * update * update * update * update * update * update * update * update * update * update * update * update * udpate * update * update * update * update * update Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 10 Aug, 2020 2 commits
-
-
Da Zheng authored
* fix tests. * fix. * remove a test. * make code work in the standalone mode. * fix example. * more fix. * make DistDataloader work with num_workers=0 * fix DistDataloader tests. * fix. * fix lint. * fix cleanup. * fix test * remove unnecessary code. * remove tests. * fix. * fix. * fix. * fix example * fix. * fix. * fix launch script. Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
Da Zheng authored
* fix perf. * fix. * accelerate metis. * fix lint. * use gklib. * fix perf. * fix. * update metis. * update launch script * handle synchronized API. * fix. * fix example. * fix dataloader. * temp fix. * temp fix omp. * distinguish roles. * initialize iterator of DistDataloader correctly. * check the correctness of launch script. * move feature copy to sampler. * measure mem/network copy time. * remove * Revert "measure mem/network copy time." This reverts commit 86cefdc14b7815fcf5aad6496af912dba48e4aa6. * fix. * fix * fix. * fix cmake. * disable metis in windows. * disable metis tests in windows. * remove test for multigraph. * fix test. * fix. * fix cmake. * fix. * revert. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 08 Aug, 2020 2 commits
-
-
Da Zheng authored
* distinguish roles. * add comments. * fix lint. * move roles to server_state. * fix text. * fix tests. * fix tests. * Revert "fix tests." This reverts commit 5baa136b872a4550d4e612bfb1dfe363d7814adf.
-
Da Zheng authored
* fix dataloader. * initialize iterator of DistDataloader correctly. * update test.
-
- 05 Aug, 2020 1 commit
-
-
Jinjing Zhou authored
* 111 * 111 * fix * 111 * fix * 11 * fix * lint * Update __init__.py * lint * fix * lint * fix * fix * fix * fix * fix * try fix * try fix * fix * Revert "fix" This reverts commit a0b954fd4e99b7df92b53db8334dcb583d6e1551. * fixes. * fix. * fix test. * fix exit. * fix. * fix * fix * lint * lint * lint * fix * Update .gitignore * 111 * fix * 111 * 111 * fff * 1111 * 111 * 1325315 * ffff * f??? * fff * 1111 * 111 * fix * 111 * asda * 1111 * 11 * 123 * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊 * spawn * 1231231 * up * 111 * fix * fix * Revert "fix" This reverts commit 7373f95312fdcaa36d2fc330bf242339e89c045d. * fix * fix * 1111 * fix * fix tests * start kvclient as early as possible. * lint * fix test * lint * 1111 * fix * fix * 111 * fix * fix * 1 * fix * fix * lint * fix * lint * lint * remove quit * fix * lint * fix * fix several * lint * fix minor * fix * lint Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 03 Aug, 2020 1 commit
-
-
Da Zheng authored
* client init graph on the backup servers. * fix. * test multi-server. * fix anonymous dist tensors. * check #parts. * fix init_data * add multi-server multi-client tests. * update tests in kvstore. * fix. * verify the loaded partition. * fix a bug. * fix lint. * fix. * fix example. * fix rpc. * fix pull/push handler for backup kvstore * fix example readme. * change ip. * update docstring. Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 01 Aug, 2020 1 commit
-
-
xiang song(charlie.song) authored
* Standalone can run * fix * Add save * Fix * Fix * Fix * Fix * debug * test * test * Fix * Fix * log * Fix * fix * Profile * auto sync grad * update * add test for unsupervised dist training * upd * Fix lr * Fix update * sync * fix * Revert "fix" This reverts commit d5caa7398b36125f6d6e2c742a95c6ff4298c9e9. * Fix * unsupervised * Fix * remove debug * Add test case for dist_graph find_edges() * Fix * skip tensorflow test for find_edges * Update readme * remove some test * upd * Update partition_graph.py Co-authored-by:
Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 31 Jul, 2020 2 commits
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
Chao Ma authored
* update * update * fix lint * update * update
-