• Da Zheng's avatar
    [Feature] Distributed graph store (#1383) · 2190c39d
    Da Zheng authored
    * initial version from distributed training.
    
    This is copied from multiprocessing training.
    
    * modify for distributed training.
    
    * it's runnable now.
    
    * measure time in neighbor sampling.
    
    * simplify neighbor sampling.
    
    * fix a bug in distributed neighbor sampling.
    
    * allow single-machine training.
    
    * fix a bug.
    
    * fix a bug.
    
    * fix openmp.
    
    * make some improvement.
    
    * fix.
    
    * add prepare in the sampler.
    
    * prepare nodeflow async.
    
    * fix a bug.
    
    * get id.
    
    * simplify the code.
    
    * improve.
    
    * fix partition.py
    
    * fix the example.
    
    * add more features.
    
    * fix the example.
    
    * allow one partition
    
    * use distributed kvstore.
    
    * do g2l map manually.
    
    * fix commandline.
    
    * a temp script to save reddit.
    
    * fix pull_handler.
    
    * add pytorch version.
    
    * estimate the time for copying data.
    
    * delete unused code.
    
    * fix a bug.
    
    * print id.
    
    * fix a bug
    
    * fix a bug
    
    * fix a bug.
    
    * remove redundent code.
    
    * revert modify in sampler.
    
    * fix temp script.
    
    * remove pytorch version.
    
    * fix.
    
    * distributed training with pytorch.
    
    * add distributed graph store.
    
    * fix.
    
    * add metis_partition_assignment.
    
    * fix a few bugs in distributed graph store.
    
    * fix test.
    
    * fix bugs in distributed graph store.
    
    * fix tests.
    
    * remove code of defining DistGraphStore.
    
    * fix partition.
    
    * fix example.
    
    * update run.sh.
    
    * only read necessary node data.
    
    * batching data fetch of multiple NodeFlows.
    
    * simplify gcn.
    
    * remove unnecessary code.
    
    * use the new copy_from_kvstore.
    
    * update training script.
    
    * print time in graphsage.
    
    * make distributed training runnable.
    
    * use val_nid.
    
    * fix train_sampling.
    
    * add distributed training.
    
    * add run.sh
    
    * add more timing.
    
    * fix a bug.
    
    * save graph metadata when partition.
    
    * create ndata and edata in distributed graph store.
    
    * add timing in minibatch training of GraphSage.
    
    * use pytorch distributed.
    
    * add checks.
    
    * fix a bug in global vs. local ids.
    
    * remove fast pull
    
    * fix a compile error.
    
    * update and add new APIs.
    
    * implement more methods in DistGraphStore.
    
    * update more APIs.
    
    * rename it to DistGraph.
    
    * rename to DistTensor
    
    * remove some unnecessary API.
    
    * remove unnecessary files.
    
    * revert changes in sampler.
    
    * Revert "simplify gcn."
    
    This reverts commit 0ed3a34ca714203a5b45240af71555d4227ce452.
    
    * Revert "simplify neighbor sampling."
    
    This reverts commit 551c72d20f05a029360ba97f312c7a7a578aacec.
    
    * Revert "measure time in neighbor sampling."
    
    This reverts commit 63ae80c7b402bb626e24acbbc8fdfe9fffd0bc64.
    
    * Revert "add timing in minibatch training of GraphSage."
    
    This reverts commit e59dc8957a414c7df5c316f51d78bce822bdef5e.
    
    * Revert "fix train_sampling."
    
    This reverts commit ea6aea9a4aabb8ba0ff63070aa51e7ca81536ad9.
    
    * fix lint.
    
    * add comments and small update.
    
    * add more comments.
    
    * add more unit tests and fix bugs.
    
    * check the existence of shared-mem graph index.
    
    * use new partitioned graph storage.
    
    * fix bugs.
    
    * print error in fast pull.
    
    * fix lint
    
    * fix a compile error.
    
    * save absolute path after partitioning.
    
    * small fixes in the example
    
    * Revert "[kvstore] support any data type for init_data() (#1465)"
    
    This reverts commit 87b6997b
    
    .
    
    * fix a bug.
    
    * disable evaluation.
    
    * Revert "Revert "[kvstore] support any data type for init_data() (#1465)""
    
    This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee.
    
    * support set and init data.
    
    * support set and init data.
    
    * Revert "Revert "[kvstore] support any data type for init_data() (#1465)""
    
    This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee.
    
    * fix bugs.
    
    * fix unit test.
    
    * move to dgl.distributed.
    
    * fix lint.
    
    * fix lint.
    
    * remove local_nids.
    
    * fix lint.
    
    * fix test.
    
    * remove train_dist.
    
    * revert train_sampling.
    
    * rename funcs.
    
    * address comments.
    
    * address comments.
    
    Use NodeDataView/EdgeDataView to keep track of data.
    
    * address comments.
    
    * address comments.
    
    * revert.
    
    * save data with DGL serializer.
    
    * use the right way of getting shape.
    
    * fix lint.
    
    * address comments.
    
    * address comments.
    
    * fix an error in mxnet.
    
    * address comments.
    
    * add edge_map.
    
    * add more test and fix bugs.
    Co-authored-by: default avatarZheng <dzzhen@186590dc80ff.ant.amazon.com>
    Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-6-131.us-east-2.compute.internal>
    Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal>
    Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-150.us-west-2.compute.internal>
    Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
    Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-30-135.us-west-2.compute.internal>
    2190c39d
ndarray.cc 13.4 KB