- 11 Jul, 2022 2 commits
-
-
Rhett Ying authored
* [Dist] enable to specify sort_etype for sample_etype_neighbours * fix lint * pass argument instead of env * fix lint and doc string * refine args * remove unnecessary lines * debug only * debug add sort time log * change interface * fix typo Co-authored-by:Xin Yao <xiny@nvidia.com>
-
Rhett Ying authored
* [Dist] format dtypes when loading graph in server * add test * refine * add comments
-
- 20 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] defer to load node/edge feats * fix lint * Update python/dgl/distributed/partition.py Co-authored-by:
Minjie Wang <minjie.wang@nyu.edu> * Update python/dgl/distributed/partition.py Co-authored-by:
Minjie Wang <minjie.wang@nyu.edu> * fix lint Co-authored-by:
Minjie Wang <minjie.wang@nyu.edu>
-
- 16 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] set socket as default backend for RPC * add tests both for socket and tensorpipe
-
- 09 Jun, 2022 1 commit
-
-
Rhett Ying authored
-
- 08 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [ist] enable time out when fetching msg * fix lint error * minor refinements * improve minor log * fix dist test * fix timeout issue in tensorpipe
-
- 18 May, 2022 1 commit
-
-
Rhett Ying authored
* [Dist][BugFix] enable sampling on bipartite * add comments for tests
-
- 11 May, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] Enable maximum try times for socket backend via DGL_DIST_MAX_TRY_TIMES * reset env before/after test * print log for info when trying to connect * fix * print log in python instead of cpp
-
- 27 Apr, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable socket net_type for rpc * fix lint * fix lint * fix build issue on windows * fix test failure on windows * fix test failure * fix cpp unit test failure * net_type blocking max_try_times * fix other comments * fix lint * fix comment * fix lint * fix cpp
-
- 24 Mar, 2022 1 commit
-
-
Rhett Ying authored
Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 14 Mar, 2022 1 commit
-
-
Rhett Ying authored
* [BugFix] pass ntype/etype into partition book when node/edge_split * fix test failure * fix test failue on mxnet * fix test failure
-
- 02 Mar, 2022 1 commit
-
-
Rhett Ying authored
-
- 30 Jan, 2022 2 commits
-
-
Rhett Ying authored
* [Fix] sleep for a while when launching clients which will connect to multiple servers * pre-allocate more ports * no multiple partitions on single machine
-
Quan (Andy) Gan authored
* initial update * more * more * multi-gpu example * cluster gcn, finalize homogeneous * more explanation * fix * bunch of fixes * fix * RGAT example and more fixes * shadow-gnn sampler and some changes in unit test * fix * wth * more fixes * remove shadow+node/edge dataloader tests for possible ux changes * lints * add legacy dataloading import just in case * fix * update pylint for f-strings * fix * lint * lint * lint again * cherry-picking commit fa9f494 * oops * fix * add sample_neighbors in dist_graph * fix * lint * fix * fix * fix * fix tutorial * fix * fix * fix * fix warning * remove debug * add get_foo_storage apis * lint
-
- 28 Jan, 2022 2 commits
-
-
Quan (Andy) Gan authored
* migrate to pylint 2.6.0 * fix * fix? * ??? * oops
-
Rhett Ying authored
-
- 26 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] long live server for multiple client groups * generate globally unique name for DistTensor within DGL automatically
-
- 19 Jan, 2022 2 commits
-
-
Jinjing Zhou authored
-
Rhett Ying authored
* [Fix] reduce error msg, refine fetch logic of available ports * un-initialize client before sending shutdown request * fix import error * print connect failure log only in debug mode * enable DMLC_LOG_DEBUG=1 in CI
-
- 11 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable TP::Receiver wait for any numbers of senders * fix random unit test failure * avoid endless future wait * fix unit test failure * fix seg fault when finalize wait in receiver * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests * fix lint * release RPCContext resources before process exits * [Debug] TPReceiver wait start log * [Debug] add log in get port * [Debug] add log * [ReDebug] revert time sleep in unit tests * [Debug] remove sleep for test_distri,test_mp * [debug] add more log * [debug] add listen_booted_ flag * [debug] restore commented code for queue * [debug] sleep more in rpc_client * restore change in tests * Revert "restore change in tests" This reverts commit 41a18926d181ec2517069389bfc41de2cc949280. * Revert "[debug] sleep more in rpc_client" This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67. * Revert "[debug] restore commented code for queue" This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301. * Revert "[debug] add listen_booted_ flag" This reverts commit 244b2167d94942ff2a0acec8823b974975e52580. * Revert "[debug] add more log" This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2. * Revert "[Debug] remove sleep for test_distri,test_mp" This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612. * remove debug code * revert unnecessary change * revert unnecessary changes * always reset RPCContext when get started and reset all data * remove time.sleep in dist tests * fix lint * reset envs before each dist test * reset env properly * add time sleep when start each server * sleep for a while when boot server * replace wait_thread with callback * fix lint * add dglconnect handshake check Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 06 Dec, 2021 2 commits
-
-
Jinjing Zhou authored
* doesn't know whether works * add change * fix * fix * fix * remove * revert * lint * lint * fix * revert * lint * fix * only build rpc on linux * lint * lint * fix build on windows * fix windows * remove old test * fix cmake * Revert "remove old test" This reverts commit f1ea75c777c34cdc1f08c0589676ba6aee1feb29. * fix windows * fix * fix * fix indent * fix indent * address comment * fix * fix * fix * fix * fix * lint * fix indent * fix lint * add introduction * fix * lint * lint * add more logs * fix * update xbyak for C++14 with gcc5 * Remove channels * fix * add test script * fix * remove unused file * fix lint * add timeout
-
Quan (Andy) Gan authored
* first commit * second commit * spaghetti unit tests * rewrite test
-
- 12 Oct, 2021 1 commit
-
-
Rhett Ying authored
-
- 01 Sep, 2021 1 commit
-
-
xiang song(charlie.song) authored
[Feature] Add a HINT for the per edge type sampler of heterogeneous DistGraph that highlighting the etypes are sorted already. (#3260) * pass cpp test * distgraph use sorted edge flag. * lint * triger * update test Co-authored-by:Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 29 Aug, 2021 1 commit
-
-
Da Zheng authored
* handle empty frontiers. * fix lint. * fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 06 Aug, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Fix dist negative data loader bug * upd * Fix Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 28 Jul, 2021 2 commits
-
-
Da Zheng authored
* make heterogeneous find_edges * add distributed EdgeDataLoader. * fix. * fix a bug. * fix bugs. * add tests on distributed heterogeneous graph sampling. * fix. Co-authored-by:Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * Basic imple of heterogeneous on homogenenous sampling * make pass * Pass C++ test * Add python test code * lint * lint * Add MultiLayerEtypeNeighborSampler * Add unitest for single machine dataloader * Add dist dataloader test for edge type sampler * Fix lint * fix * support for per etype sample * Fix some bug and enable distributed training with per edge sample * fix * Now distributed training works * turn off some mxnet * turn off mxnet for some dist test * fix * upd * upd according to the comments * Fix * Fix test and now distributed works. * upd * upd * Fix * Fix bug * remove dead code. * upd * Fix * upd * Fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 17 Jul, 2021 1 commit
-
-
Da Zheng authored
* support hetero RGCN. * fix. * simplify code. * sample_neighbors return heterograph directly. * avoid using to_heterogeneous. * compute canonical etypes in advance. * fix tests. * fix. * fix distributed data loader for heterograph. * use NodeDataLoader. * fix bugs in partitioning on heterogeneous graphs. * fix lint. * fix tests. * fix. * fix. * fix bugs. * fix tests. * fix. * enable coo for distributed. * fix. * fix. * fix. * fix. * fix. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
- 13 Jul, 2021 1 commit
-
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * update doc Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 05 Jul, 2021 1 commit
-
-
Da Zheng authored
* fix. * fix. * fix. * fix. * Fix test Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 02 Jul, 2021 2 commits
-
-
Da Zheng authored
* fix bugs in partitioning on heterogeneous graphs. * fix. * fix. * fix example. * fix. * fix test. * fix. * fix. * fix. * fix tests. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
Jinjing Zhou authored
* try enable kvstore test * fix * fix * seperate out kvstore test * add comment
-
- 25 Jun, 2021 1 commit
-
-
Quan (Andy) Gan authored
* csr and csc creation * fix * fix * fixes to adj transpose * fine * raise error if indptr did not match number of nodes * fix * huh? * oh Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 16 Jun, 2021 1 commit
-
-
Da Zheng authored
* add. * fix. * fix. * fix. * fix. * add tests. * support node split and edge split. * support 1 partition. * add tests. * fix. * fix test. * use hierarchical partition. * add check. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-22-57.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 26 May, 2021 1 commit
-
-
Da Zheng authored
* explicitly set the graph format. * fix. * fix. * fix launch script. * fix readme. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 18 May, 2021 1 commit
-
-
Da Zheng authored
* add distributed in-degree and out-degree. * update comments. * fix a bug. * add tests. * add tests. * fix a bug. * fix docstring. * update doc. * fix * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 03 May, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Draft for sparse emb * add some notes * Fix * Add sparse optim for dist pytorch * Update test * Fix * upd * upd * Fix * Fix * Fix bug * add transductive exmpale * Fix example * Some fix * Upd * Fix lint * lint * lint * lint * upd * Fix lint * lint * upd * remove dead import * update * lint * update unitest * update example * Add adam optimizer * Add unitest and update data * upd * upd * upd * Fix docstring and fix some bug in example code * Update rgcn readme Co-authored-by:
Ubuntu <ubuntu@ip-172-31-57-25.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 26 Apr, 2021 1 commit
-
-
Da Zheng authored
* update distributed training doc. * explain data split. * fix message passing. * id mapping. * fix. * test data reshuffling. * fix a bug. * fix test. * Revert "fix." This reverts commit 2d025e9e1a5c05c3da9b803a035a788ced59bd77. * Revert "id mapping." This reverts commit 2a6a93ceb81fbdff86e6e9e5a58e1ace1e9d9882. * Revert "fix message passing." This reverts commit ed8a86bf2b015e5e4f64ba160e81b207ad2a1d65. * Revert "explain data split." This reverts commit 4338ddf8a336014cf92d4cb9a1db02b9badc0e55. * Revert "update distributed training doc." This reverts commit dda1c35c44536934c19715534f01f832afda6ad2. * add more tests. * fix. * fix. * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 22 Apr, 2021 1 commit
-
-
Da Zheng authored
* return mapping. * support heterogeneous graph. * more test. * fix lint. * fix for diff backends. * fix. * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-