- 09 Feb, 2022 1 commit
-
-
Xin Yao authored
* implement pin_memory/unpin_memory/is_pinned for dgl.graph * update python docstring * update c++ docstring * add test * fix the broken UnifiedTensor * XPU_SWITCH for kDLCPUPinned * a rough version ready for testing * eliminate extra context parameter for pin/unpin * update train_sampling * fix linting * fix typo * multi-gpu uva sampling case * disable new format materialization for pinned graphs * update python doc for pin_memory_ * fix unit test * UVA sampling for link prediction * dispatch most csr ops * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update doc * update examples * change unitgraph and heterograph's PinMemory to in-place * update examples for multi-gpu uva sampling * update doc * fix linting * fix cpu build * fix is_pinned for DistGraph * fix is_pinned for DistGraph * update graphsage unsupervised example * update doc for gpu sampling * update some check for sampling device switching * fix linting * adapt for new dataloader * fix linting * fix * fix some name issue * adjust device check * add unit test for uva sampling & fix some zero_copy bug * fix linting * update num_threads in graphsage examples Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 30 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* initial update * more * more * multi-gpu example * cluster gcn, finalize homogeneous * more explanation * fix * bunch of fixes * fix * RGAT example and more fixes * shadow-gnn sampler and some changes in unit test * fix * wth * more fixes * remove shadow+node/edge dataloader tests for possible ux changes * lints * add legacy dataloading import just in case * fix * update pylint for f-strings * fix * lint * lint * lint again * cherry-picking commit fa9f494 * oops * fix * add sample_neighbors in dist_graph * fix * lint * fix * fix * fix * fix tutorial * fix * fix * fix * fix warning * remove debug * add get_foo_storage apis * lint
-
- 26 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] long live server for multiple client groups * generate globally unique name for DistTensor within DGL automatically
-
- 10 Aug, 2021 1 commit
-
-
https://github.com/dmlc/dgl/pull/3131xiang song(charlie.song) authored
* Fix bug * Fix * Fix * upd * trigger
-
- 06 Aug, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Fix dist negative data loader bug * upd * Fix Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 05 Aug, 2021 1 commit
-
-
freeliuzc authored
* add count_nonzero function for DistTensor * change the load method of local data Co-authored-by:
liuzichang04 <liuzichang04@meituan.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 28 Jul, 2021 2 commits
-
-
Da Zheng authored
* make heterogeneous find_edges * add distributed EdgeDataLoader. * fix. * fix a bug. * fix bugs. * add tests on distributed heterogeneous graph sampling. * fix. Co-authored-by:Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
xiang song(charlie.song) authored
* fix. * fix. * fix. * fix. * Fix test * Deprecate old DistEmbedding impl, use synchronized embedding impl * Basic imple of heterogeneous on homogenenous sampling * make pass * Pass C++ test * Add python test code * lint * lint * Add MultiLayerEtypeNeighborSampler * Add unitest for single machine dataloader * Add dist dataloader test for edge type sampler * Fix lint * fix * support for per etype sample * Fix some bug and enable distributed training with per edge sample * fix * Now distributed training works * turn off some mxnet * turn off mxnet for some dist test * fix * upd * upd according to the comments * Fix * Fix test and now distributed works. * upd * upd * Fix * Fix bug * remove dead code. * upd * Fix * upd * Fix Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 17 Jul, 2021 1 commit
-
-
Da Zheng authored
* support hetero RGCN. * fix. * simplify code. * sample_neighbors return heterograph directly. * avoid using to_heterogeneous. * compute canonical etypes in advance. * fix tests. * fix. * fix distributed data loader for heterograph. * use NodeDataLoader. * fix bugs in partitioning on heterogeneous graphs. * fix lint. * fix tests. * fix. * fix. * fix bugs. * fix tests. * fix. * enable coo for distributed. * fix. * fix. * fix. * fix. * fix. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
- 15 Jul, 2021 1 commit
-
-
Jingcheng Yu authored
Co-authored-by:
yujingcheng02 <yujingcheng02@meituan.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 05 Jul, 2021 1 commit
-
-
Da Zheng authored
* fix. * fix. * fix. * fix. * Fix test Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
-
- 16 Jun, 2021 1 commit
-
-
Da Zheng authored
* add. * fix. * fix. * fix. * fix. * add tests. * support node split and edge split. * support 1 partition. * add tests. * fix. * fix test. * use hierarchical partition. * add check. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-22-57.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 02 Jun, 2021 1 commit
-
-
Da Zheng authored
Co-authored-by:Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
-
- 26 May, 2021 1 commit
-
-
Da Zheng authored
* explicitly set the graph format. * fix. * fix. * fix launch script. * fix readme. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
-
- 18 May, 2021 1 commit
-
-
Da Zheng authored
* add distributed in-degree and out-degree. * update comments. * fix a bug. * add tests. * add tests. * fix a bug. * fix docstring. * update doc. * fix * fix. Co-authored-by:
Zheng <dzzhen@3c22fba32af5.ant.amazon.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 25 Jan, 2021 1 commit
-
-
Da Zheng authored
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>
-
- 08 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
* return indices from dataloader * fixes * fix * fix distgraph and added some todos * Update dataloader.py * Update dataloader.py Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 19 Aug, 2020 1 commit
-
-
Da Zheng authored
* quick fix. * update sparse optimizer. * fix. * fix
-
- 18 Aug, 2020 1 commit
-
-
Da Zheng authored
* add doc. * update DistGraph. * add DistTensor. * update DistEmbedding. * add partition.py * add sampling. * fix. * add graph partition book and create a base class. * fix test. * add rst. * update doc rst. * update. * fix. * fix docs * update distributed tensor and embeddings. * add checks. * update DistGraph. * update initialization. * fix graph partition book. * update graph partition book. * update partition. * update partition. * fix. * add example code. * update DistGraph * Update python/dgl/distributed/dist_context.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_context.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * Update python/dgl/distributed/dist_dataloader.py Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> * update initialize. * update dataloader. * update distgraph. * update DistGraph. * update DistTensor. * update. * more updates. * fix lint. * add num_nodes and num_edges Co-authored-by:
Chao Ma <mctt90@gmail.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 12 Aug, 2020 1 commit
-
-
Da Zheng authored
* rename get_data_size. * remove g from DistTensor. * remove g from DistEmbedding. * clean up API of graph partition book. * fix DistGraph * fix lint. * collect all part policies. * fix. * fix. * support distributed sampler. * remove partition.py
-
- 11 Aug, 2020 1 commit
-
-
Chao Ma authored
* remove server_count from ip_config.txt * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * lint * update * update * update * update * update * update * update * update * update * update * update * update * update * Update dist_context.py * fix lint. * make it work for multiple spaces. * update ip_config.txt. * fix examples. * update * update * update * update * update * update * update * update * update * update * update * update * update * update * udpate * update * update * update * update * update Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 10 Aug, 2020 1 commit
-
-
Da Zheng authored
* fix tests. * fix. * remove a test. * make code work in the standalone mode. * fix example. * more fix. * make DistDataloader work with num_workers=0 * fix DistDataloader tests. * fix. * fix lint. * fix cleanup. * fix test * remove unnecessary code. * remove tests. * fix. * fix. * fix. * fix example * fix. * fix. * fix launch script. Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 08 Aug, 2020 1 commit
-
-
Da Zheng authored
* distinguish roles. * add comments. * fix lint. * move roles to server_state. * fix text. * fix tests. * fix tests. * Revert "fix tests." This reverts commit 5baa136b872a4550d4e612bfb1dfe363d7814adf.
-
- 05 Aug, 2020 1 commit
-
-
Jinjing Zhou authored
* 111 * 111 * fix * 111 * fix * 11 * fix * lint * Update __init__.py * lint * fix * lint * fix * fix * fix * fix * fix * try fix * try fix * fix * Revert "fix" This reverts commit a0b954fd4e99b7df92b53db8334dcb583d6e1551. * fixes. * fix. * fix test. * fix exit. * fix. * fix * fix * lint * lint * lint * fix * Update .gitignore * 111 * fix * 111 * 111 * fff * 1111 * 111 * 1325315 * ffff * f??? * fff * 1111 * 111 * fix * 111 * asda * 1111 * 11 * 123 * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊 * spawn * 1231231 * up * 111 * fix * fix * Revert "fix" This reverts commit 7373f95312fdcaa36d2fc330bf242339e89c045d. * fix * fix * 1111 * fix * fix tests * start kvclient as early as possible. * lint * fix test * lint * 1111 * fix * fix * 111 * fix * fix * 1 * fix * fix * lint * fix * lint * lint * remove quit * fix * lint * fix * fix several * lint * fix minor * fix * lint Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 03 Aug, 2020 1 commit
-
-
Da Zheng authored
* client init graph on the backup servers. * fix. * test multi-server. * fix anonymous dist tensors. * check #parts. * fix init_data * add multi-server multi-client tests. * update tests in kvstore. * fix. * verify the loaded partition. * fix a bug. * fix lint. * fix. * fix example. * fix rpc. * fix pull/push handler for backup kvstore * fix example readme. * change ip. * update docstring. Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 01 Aug, 2020 1 commit
-
-
xiang song(charlie.song) authored
* Standalone can run * fix * Add save * Fix * Fix * Fix * Fix * debug * test * test * Fix * Fix * log * Fix * fix * Profile * auto sync grad * update * add test for unsupervised dist training * upd * Fix lr * Fix update * sync * fix * Revert "fix" This reverts commit d5caa7398b36125f6d6e2c742a95c6ff4298c9e9. * Fix * unsupervised * Fix * remove debug * Add test case for dist_graph find_edges() * Fix * skip tensorflow test for find_edges * Update readme * remove some test * upd * Update partition_graph.py Co-authored-by:
Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 31 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix bugs. * eval on both vaidation and testing. * add script. * update. * update launch. * make train_dist.py independent. * update readme. * update readme. * update readme. * update readme. * generate undirected graph. * rename conf_file to part_config * use rsync * make train_dist independent. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-115.us-west-2.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 29 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix tests in partition. * fix DistGraph. * fix without shared memory. * fix sampling. * enable distributed test. * fix tests. * fix a bug in shared-mem heterograph. * print better error messages. * fix. * don't specify formats. * fix. * fix * small fix.
-
- 28 Jul, 2020 2 commits
-
-
Jinjing Zhou authored
This reverts commit 6557291f.
-
Jinjing Zhou authored
* 111 * 111 * fix * 111 * fix * 11 * fix * lint * Update __init__.py * lint * fix * lint * fix * fix * fix * fix * fix * try fix * try fix * fix * Revert "fix" This reverts commit a0b954fd4e99b7df92b53db8334dcb583d6e1551. * fixes. * fix. * fix test. * fix exit. * fix. * fix * fix * lint * lint * lint * fix * Update .gitignore Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 27 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix node/edge_split. * fix partition. * support heterograph interface. * fix test. * fix * fix docstring.
-
- 22 Jul, 2020 1 commit
-
-
Da Zheng authored
* add eval. * extend DistTensor. * fix. * add barrier. * add more print. * add more checks in kvstore. * fix lint. * get all neighbors for eval. * reorganize. * fix. * fix. * fix. * fix test. * add reuse_if_exist. * add test for reuse_if_exist. * fix lint. * fix bugs. * fix. * print errors of tcp socket. * support delete tensors. * fix lint. * fix * fix example Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
-
- 20 Jul, 2020 1 commit
-
-
Chao Ma authored
* exit client * update * update * update * update * update * update * update * update test * update * update * update * update * update * update * update * update * update
-
- 15 Jul, 2020 1 commit
-
-
Da Zheng authored
* add standalone mode * add comments. * add tests for sampling. * fix. * make the code to run the standalone mode * fix * fix * fix readme. * fix. * fix test Co-authored-by:Chao Ma <mctt90@gmail.com>
-
- 14 Jul, 2020 1 commit
-
-
Da Zheng authored
* run dist server in dgl. * fix bugs. * fix example. * check environment variables and fix lint. * fix lint
-
- 09 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix * use utils.toindex in the right place. * fix. * update tensor for mxnet backend. * fix * fix
-
- 06 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix * fix. * update. * fix * add assert Co-authored-by:Chao Ma <mctt90@gmail.com>
-
- 03 Jul, 2020 1 commit
-
-
Da Zheng authored
* add sparse embedding. * fix * add test. * man fixes. * many fixes * fix sparse emb. * fix. * fix lint. * fix lint. * fix kvstore. * expose DistTensor. * test sparse embeddings. * add attach_grad to the backends. * remove part_id * fix. * move backward computation. * move more computation to backend. * fix a bug when applying learning rate. * fix a few things. * fix a few things. * add docstring * fix. * apply no_grad. * fix tests. * fix for other frameworks. * add examples in docstring.
-
- 01 Jul, 2020 1 commit
-
-
Da Zheng authored
* fix. * fix tests. * fix * add tests. * fix. * have default rank. * add comment. * fix test. * remove check * simplify code. * add test. * split data evenly. * simplify the distributed training code. * add comments. * add comments.
-
- 28 Jun, 2020 1 commit
-
-
Da Zheng authored
* add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * fix a bug. * add train_dist. * Fix sampling example. * use distributed sampler. * fix a bug in DistTensor. * fix distributed training example. * add graph partition. * add command * disable pytorch parallel. * shutdown correctly. * load diff graphs. * add ip_config.txt. * record timing for each step. * use ogb * add profiler. * add Ips of the cluster. * fix exit. * support multiple clients. * balance node types and edges. * move code. * remove run.sh * Revert "support multiple clients." * fix. * update train_sampling. * fix. * fix * remove run.sh * update readme. * update readme. * use pytorch distributed. * ensure all trainers run the same number of steps. * Update README.md Co-authored-by:Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
-