- 21 Nov, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] instantiate NodeDataView in lazy mode * fix test failure * init node/edge data store at the very beginning * fix test failures * refine comment * add more tests
-
- 07 Nov, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] enable access DistGraph.edges via canonical etype * refine code * refine test * refine code
-
- 01 Nov, 2022 1 commit
-
-
peizhou001 authored
* add save/load for distributed optimizer Co-authored-by:Ubuntu <ubuntu@ip-172-31-16-19.ap-northeast-1.compute.internal>
-
- 17 Oct, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] Reduce peak memory in DistDGL: avoid validation, release memory once loaded * remove orig_id from ndata/edata for partition_graph() * delete orig_id from ndata/edata in dist part pipeline * reduce dtype size and format before saving graphs * fix lint * ETYPE requires to be int32/64 for CSRSortByTag * fix test failure * refine
-
- 10 Oct, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 30 Sep, 2022 1 commit
-
-
Quan (Andy) Gan authored
* first commit * add test * fixes * ah this is how you skip setup * fix * ugh * address comments * i like black
-
- 03 Aug, 2022 1 commit
-
-
Rhett Ying authored
-
- 11 Jul, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] format dtypes when loading graph in server * add test * refine * add comments
-
- 09 Jun, 2022 1 commit
-
-
Rhett Ying authored
-
- 14 Mar, 2022 1 commit
-
-
Rhett Ying authored
* [BugFix] pass ntype/etype into partition book when node/edge_split * fix test failure * fix test failue on mxnet * fix test failure
-
- 30 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Fix] sleep for a while when launching clients which will connect to multiple servers * pre-allocate more ports * no multiple partitions on single machine
-
- 28 Jan, 2022 1 commit
-
-
Rhett Ying authored
-
- 26 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] long live server for multiple client groups * generate globally unique name for DistTensor within DGL automatically
-
- 19 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Fix] reduce error msg, refine fetch logic of available ports * un-initialize client before sending shutdown request * fix import error * print connect failure log only in debug mode * enable DMLC_LOG_DEBUG=1 in CI
-
- 11 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable TP::Receiver wait for any numbers of senders * fix random unit test failure * avoid endless future wait * fix unit test failure * fix seg fault when finalize wait in receiver * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests * fix lint * release RPCContext resources before process exits * [Debug] TPReceiver wait start log * [Debug] add log in get port * [Debug] add log * [ReDebug] revert time sleep in unit tests * [Debug] remove sleep for test_distri,test_mp * [debug] add more log * [debug] add listen_booted_ flag * [debug] restore commented code for queue * [debug] sleep more in rpc_client * restore change in tests * Revert "restore change in tests" This reverts commit 41a18926d181ec2517069389bfc41de2cc949280. * Revert "[debug] sleep more in rpc_client" This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67. * Revert "[debug] restore commented code for queue" This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301. * Revert "[debug] add listen_booted_ flag" This reverts commit 244b2167d94942ff2a0acec8823b974975e52580. * Revert "[debug] add more log" This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2. * Revert "[Debug] remove sleep for test_distri,test_mp" This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612. * remove debug code * revert unnecessary change * revert unnecessary changes * always reset RPCContext when get started and reset all data * remove time.sleep in dist tests * fix lint * reset envs before each dist test * reset env properly * add time sleep when start each server * sleep for a while when boot server * replace wait_thread with callback * fix lint * add dglconnect handshake check Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 12 Oct, 2021 1 commit
-
-
Rhett Ying authored
-
- 28 Jul, 2021 1 commit
-
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * Basic imple of heterogeneous on homogenenous sampling * make pass * Pass C++ test * Add python test code * lint * lint * Add MultiLayerEtypeNeighborSampler * Add unitest for single machine dataloader * Add dist dataloader test for edge type sampler * Fix lint * fix * support for per etype sample * Fix some bug and enable distributed training with per edge sample * fix * Now distributed training works * turn off some mxnet * turn off mxnet for some dist test * fix * upd * upd according to the comments * Fix * Fix test and now distributed works. * upd * upd * Fix * Fix bug * remove dead code. * upd * Fix * upd * Fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 17 Jul, 2021 1 commit
-
-
Da Zheng authored
* support hetero RGCN. * fix. * simplify code. * sample_neighbors return heterograph directly. * avoid using to_heterogeneous. * compute canonical etypes in advance. * fix tests. * fix. * fix distributed data loader for heterograph. * use NodeDataLoader. * fix bugs in partitioning on heterogeneous graphs. * fix lint. * fix tests. * fix. * fix. * fix bugs. * fix tests. * fix. * enable coo for distributed. * fix. * fix. * fix. * fix. * fix. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
- 13 Jul, 2021 1 commit
-
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * update doc Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 05 Jul, 2021 1 commit
-
-
Da Zheng authored
* fix. * fix. * fix. * fix. * Fix test Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 02 Jul, 2021 1 commit
-
-
Jinjing Zhou authored
* try enable kvstore test * fix * fix * seperate out kvstore test * add comment
-
- 16 Jun, 2021 1 commit
-
-
Da Zheng authored
* add. * fix. * fix. * fix. * fix. * add tests. * support node split and edge split. * support 1 partition. * add tests. * fix. * fix test. * use hierarchical partition. * add check. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-22-57.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 03 May, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Draft for sparse emb * add some notes * Fix * Add sparse optim for dist pytorch * Update test * Fix * upd * upd * Fix * Fix * Fix bug * add transductive exmpale * Fix example * Some fix * Upd * Fix lint * lint * lint * lint * upd * Fix lint * lint * upd * remove dead import * update * lint * update unitest * update example * Add adam optimizer * Add unitest and update data * upd * upd * upd * Fix docstring and fix some bug in example code * Update rgcn readme Co-authored-by:
Ubuntu <ubuntu@ip-172-31-57-25.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 13 Apr, 2021 1 commit
-
-
Da Zheng authored
* fix. * test distributed graph without node/edge data. * remove some tests. * fix lint
-
- 30 Mar, 2021 1 commit
-
-
Da Zheng authored
* remove num_workers. * remove num_workers. * remove num_workers. * remove num-servers. * update error message. * update docstring. * fix docs. * fix tests. * fix test. * fix. * print messages in test. * fix. * fix test. * fix. Co-authored-by:Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
-
- 25 Jan, 2021 1 commit
-
-
Da Zheng authored
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>
-
- 17 Aug, 2020 1 commit
-
-
Mufei Li authored
* Update graph * Fix for dgl.graph * from_scipy * Replace canonical_etypes with relations * from_networkx * Update for hetero_from_relations * Roll back the change of canonical_etypes to relations * heterograph * bipartite * Update doc * Fix lint * Fix lint * Fix test cases * Fix * Fix * Fix * Fix * Fix * Fix * Update * Fix test * Fix * Update * Use DGLError * Update * Update * Update * Update * Fix * Fix * Fix * Fix * Fix * Fix * Fix * Fix * Update * Fix * Update * Fix * Fix * Fix * Update * Fix * Update * Fix * Update * Update * Update * Update * Update * Update * Update * Fix * Fix * Update * Update * Update * Update * Update * Update * rewrite sanity checks * delete unnecessary checks * Update * Update * Update * Update * Update * Update * Update * Update * Fix * Update * Update * Update * Fix * Fix * Fix * Update * Fix * Update * Fix * Fix * Update * Fix * Update * Fix Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com>
-
- 14 Aug, 2020 2 commits
- 12 Aug, 2020 1 commit
-
-
Da Zheng authored
* rename get_data_size. * remove g from DistTensor. * remove g from DistEmbedding. * clean up API of graph partition book. * fix DistGraph * fix lint. * collect all part policies. * fix. * fix. * support distributed sampler. * remove partition.py
-
- 11 Aug, 2020 1 commit
-
-
Chao Ma authored
* remove server_count from ip_config.txt * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * lint * update * update * update * update * update * update * update * update * update * update * update * update * update * Update dist_context.py * fix lint. * make it work for multiple spaces. * update ip_config.txt. * fix examples. * update * update * update * update * update * update * update * update * update * update * update * update * update * update * udpate * update * update * update * update * update Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 10 Aug, 2020 2 commits
-
-
Da Zheng authored
* fix tests. * fix. * remove a test. * make code work in the standalone mode. * fix example. * more fix. * make DistDataloader work with num_workers=0 * fix DistDataloader tests. * fix. * fix lint. * fix cleanup. * fix test * remove unnecessary code. * remove tests. * fix. * fix. * fix. * fix example * fix. * fix. * fix launch script. Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
Da Zheng authored
* fix perf. * fix. * accelerate metis. * fix lint. * use gklib. * fix perf. * fix. * update metis. * update launch script * handle synchronized API. * fix. * fix example. * fix dataloader. * temp fix. * temp fix omp. * distinguish roles. * initialize iterator of DistDataloader correctly. * check the correctness of launch script. * move feature copy to sampler. * measure mem/network copy time. * remove * Revert "measure mem/network copy time." This reverts commit 86cefdc14b7815fcf5aad6496af912dba48e4aa6. * fix. * fix * fix. * fix cmake. * disable metis in windows. * disable metis tests in windows. * remove test for multigraph. * fix test. * fix. * fix cmake. * fix. * revert. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 08 Aug, 2020 1 commit
-
-
Da Zheng authored
* distinguish roles. * add comments. * fix lint. * move roles to server_state. * fix text. * fix tests. * fix tests. * Revert "fix tests." This reverts commit 5baa136b872a4550d4e612bfb1dfe363d7814adf.
-
- 05 Aug, 2020 1 commit
-
-
Jinjing Zhou authored
* 111 * 111 * fix * 111 * fix * 11 * fix * lint * Update __init__.py * lint * fix * lint * fix * fix * fix * fix * fix * try fix * try fix * fix * Revert "fix" This reverts commit a0b954fd4e99b7df92b53db8334dcb583d6e1551. * fixes. * fix. * fix test. * fix exit. * fix. * fix * fix * lint * lint * lint * fix * Update .gitignore * 111 * fix * 111 * 111 * fff * 1111 * 111 * 1325315 * ffff * f??? * fff * 1111 * 111 * fix * 111 * asda * 1111 * 11 * 123 * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊 * spawn * 1231231 * up * 111 * fix * fix * Revert "fix" This reverts commit 7373f95312fdcaa36d2fc330bf242339e89c045d. * fix * fix * 1111 * fix * fix tests * start kvclient as early as possible. * lint * fix test * lint * 1111 * fix * fix * 111 * fix * fix * 1 * fix * fix * lint * fix * lint * lint * remove quit * fix * lint * fix * fix several * lint * fix minor * fix * lint Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 03 Aug, 2020 1 commit
-
-
Da Zheng authored
* client init graph on the backup servers. * fix. * test multi-server. * fix anonymous dist tensors. * check #parts. * fix init_data * add multi-server multi-client tests. * update tests in kvstore. * fix. * verify the loaded partition. * fix a bug. * fix lint. * fix. * fix example. * fix rpc. * fix pull/push handler for backup kvstore * fix example readme. * change ip. * update docstring. Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 31 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 29 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix tests in partition. * fix DistGraph. * fix without shared memory. * fix sampling. * enable distributed test. * fix tests. * fix a bug in shared-mem heterograph. * print better error messages. * fix. * don't specify formats. * fix. * fix * small fix.
-
- 27 Jul, 2020 2 commits