- 14 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
* fix munmap() using wrong parameter * rename variables Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 26 Dec, 2020 1 commit
-
-
Da Zheng authored
* delete shared memory when receive signal. * rename. * fix lint. * fix lint. * fix compile. * Fix. * we need to report error if the shared memory exist. * disable tensorflow test for shared memory. * revert. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 28 Jul, 2020 1 commit
-
-
Qidong Su authored
* update * update * update * update * fix * update * fix * update * update * win32 * update * fix * update * update * update * updat * update * update * fix * update * update * update * update * update * fix * TODO * 111 * fix * minor fix * minor fix * fox * Update shared_mem_manager.cc * update * update * update * update metis * update metis * update Co-authored-by:
VoVAllen <jz1749@nyu.edu> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 03 May, 2020 1 commit
-
-
Da Zheng authored
* initial version from distributed training. This is copied from multiprocessing training. * modify for distributed training. * it's runnable now. * measure time in neighbor sampling. * simplify neighbor sampling. * fix a bug in distributed neighbor sampling. * allow single-machine training. * fix a bug. * fix a bug. * fix openmp. * make some improvement. * fix. * add prepare in the sampler. * prepare nodeflow async. * fix a bug. * get id. * simplify the code. * improve. * fix partition.py * fix the example. * add more features. * fix the example. * allow one partition * use distributed kvstore. * do g2l map manually. * fix commandline. * a temp script to save reddit. * fix pull_handler. * add pytorch version. * estimate the time for copying data. * delete unused code. * fix a bug. * print id. * fix a bug * fix a bug * fix a bug. * remove redundent code. * revert modify in sampler. * fix temp script. * remove pytorch version. * fix. * distributed training with pytorch. * add distributed graph store. * fix. * add metis_partition_assignment. * fix a few bugs in distributed graph store. * fix test. * fix bugs in distributed graph store. * fix tests. * remove code of defining DistGraphStore. * fix partition. * fix example. * update run.sh. * only read necessary node data. * batching data fetch of multiple NodeFlows. * simplify gcn. * remove unnecessary code. * use the new copy_from_kvstore. * update training script. * print time in graphsage. * make distributed training runnable. * use val_nid. * fix train_sampling. * add distributed training. * add run.sh * add more timing. * fix a bug. * save graph metadata when partition. * create ndata and edata in distributed graph store. * add timing in minibatch training of GraphSage. * use pytorch distributed. * add checks. * fix a bug in global vs. local ids. * remove fast pull * fix a compile error. * update and add new APIs. * implement more methods in DistGraphStore. * update more APIs. * rename it to DistGraph. * rename to DistTensor * remove some unnecessary API. * remove unnecessary files. * revert changes in sampler. * Revert "simplify gcn." This reverts commit 0ed3a34ca714203a5b45240af71555d4227ce452. * Revert "simplify neighbor sampling." This reverts commit 551c72d20f05a029360ba97f312c7a7a578aacec. * Revert "measure time in neighbor sampling." This reverts commit 63ae80c7b402bb626e24acbbc8fdfe9fffd0bc64. * Revert "add timing in minibatch training of GraphSage." This reverts commit e59dc8957a414c7df5c316f51d78bce822bdef5e. * Revert "fix train_sampling." This reverts commit ea6aea9a4aabb8ba0ff63070aa51e7ca81536ad9. * fix lint. * add comments and small update. * add more comments. * add more unit tests and fix bugs. * check the existence of shared-mem graph index. * use new partitioned graph storage. * fix bugs. * print error in fast pull. * fix lint * fix a compile error. * save absolute path after partitioning. * small fixes in the example * Revert "[kvstore] support any data type for init_data() (#1465)" This reverts commit 87b6997b . * fix a bug. * disable evaluation. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * support set and init data. * support set and init data. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * fix bugs. * fix unit test. * move to dgl.distributed. * fix lint. * fix lint. * remove local_nids. * fix lint. * fix test. * remove train_dist. * revert train_sampling. * rename funcs. * address comments. * address comments. Use NodeDataView/EdgeDataView to keep track of data. * address comments. * address comments. * revert. * save data with DGL serializer. * use the right way of getting shape. * fix lint. * address comments. * address comments. * fix an error in mxnet. * address comments. * add edge_map. * add more test and fix bugs. Co-authored-by:
Zheng <dzzhen@186590dc80ff.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-6-131.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-150.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-30-135.us-west-2.compute.internal>
-
- 20 May, 2019 1 commit
-
-
Da Zheng authored
* fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature.
-
- 08 Apr, 2019 1 commit
-
-
Da Zheng authored
* accelerate gcn_ns. * add timing. * run infer with whole graph. * distributed gcn_ns. * reconstruct gcn_ns. * minor fix. * change graphsage_cv for numa. * fix #OMP threads. * accelerate graphsage_cv. * fix a weird bug. * add profiler in graphsage_cv. * accelerate graphsage_cv. manually aggregate neighbors' embeddings with pull. * load csr directly in gcn_ns_sc. * parallel sort for graph index. * Revert "parallel sort for graph index." This reverts commit 86fe2c7117fe5e56b0d481b39849c258b166945b. * run gcn_ns_sc on GPUs. * acc gcn_cv_sc. * change gcn_cv for numa. * fix gcn_cv to use numa and gpu. * improve graphsage_cv to use numa and gpu. * improve gcn_ns. * improve graphsage_cv. * init shared memory graph store. * fix. * enable init ndata. * improve tests. * add bidirectional communication. * link to rt. * fix compilation error. * fix shared memory init. * use MessageQueue for inter-process communication. * reconstruct immutable graph csr. * fix gcn. * load csr to shared memory. * fix minor bugs. * add comments. * refactor SharedMemory. * fix bugs in ImmutableGraph. * create CSR graph from shared memory. * add more test for loading a csr graph. * terminate graph store properly. * allow initializing ndata in the graph store server. * use RPC for inter-process communication. * a script for loading a graph. * allow customizing port. * list all ndata and edata. * support dtype. * reorganize SharedMemoryGraphStore. * fix ndata shape. * reconstruct gcn_ns. * print info. * set omp in gcn_ns. * reset sampling examples. * fix lint. * fix lint. * reset gcn. * disable shared memory in windows. * fix. * fix. * reset changes. * revert nodeflow changes. * fix cmake. * fix test. * fix test. * fix test. * fix test. * add comments. * fix test. * move vector out. * fix lint. * fix lint. * move SharedMemory. * update cmake. * update comment. * fix comments. * Revert "update cmake." This reverts commit 592445e37077f70a6e3f2e5245f9a3d086b04f3b. * update cmake. * add comments. * rename. * change the comment. * fix a bug. * rename. * add comments. * add comments. * add init_edata. * rewrite memory alloc. * move vector to CSR. * fix. * init data. * Revert "init data." This reverts commit 2b217b9553911b7dd84a9f1d9b68430b5aa18e23. * init data. * init new columns correctly.
-