1. 03 May, 2020 1 commit
    • Da Zheng's avatar
      [Feature] Distributed graph store (#1383) · 2190c39d
      Da Zheng authored
      * initial version from distributed training.
      
      This is copied from multiprocessing training.
      
      * modify for distributed training.
      
      * it's runnable now.
      
      * measure time in neighbor sampling.
      
      * simplify neighbor sampling.
      
      * fix a bug in distributed neighbor sampling.
      
      * allow single-machine training.
      
      * fix a bug.
      
      * fix a bug.
      
      * fix openmp.
      
      * make some improvement.
      
      * fix.
      
      * add prepare in the sampler.
      
      * prepare nodeflow async.
      
      * fix a bug.
      
      * get id.
      
      * simplify the code.
      
      * improve.
      
      * fix partition.py
      
      * fix the example.
      
      * add more features.
      
      * fix the example.
      
      * allow one partition
      
      * use distributed kvstore.
      
      * do g2l map manually.
      
      * fix commandline.
      
      * a temp script to save reddit.
      
      * fix pull_handler.
      
      * add pytorch version.
      
      * estimate the time for copying data.
      
      * delete unused code.
      
      * fix a bug.
      
      * print id.
      
      * fix a bug
      
      * fix a bug
      
      * fix a bug.
      
      * remove redundent code.
      
      * revert modify in sampler.
      
      * fix temp script.
      
      * remove pytorch version.
      
      * fix.
      
      * distributed training with pytorch.
      
      * add distributed graph store.
      
      * fix.
      
      * add metis_partition_assignment.
      
      * fix a few bugs in distributed graph store.
      
      * fix test.
      
      * fix bugs in distributed graph store.
      
      * fix tests.
      
      * remove code of defining DistGraphStore.
      
      * fix partition.
      
      * fix example.
      
      * update run.sh.
      
      * only read necessary node data.
      
      * batching data fetch of multiple NodeFlows.
      
      * simplify gcn.
      
      * remove unnecessary code.
      
      * use the new copy_from_kvstore.
      
      * update training script.
      
      * print time in graphsage.
      
      * make distributed training runnable.
      
      * use val_nid.
      
      * fix train_sampling.
      
      * add distributed training.
      
      * add run.sh
      
      * add more timing.
      
      * fix a bug.
      
      * save graph metadata when partition.
      
      * create ndata and edata in distributed graph store.
      
      * add timing in minibatch training of GraphSage.
      
      * use pytorch distributed.
      
      * add checks.
      
      * fix a bug in global vs. local ids.
      
      * remove fast pull
      
      * fix a compile error.
      
      * update and add new APIs.
      
      * implement more methods in DistGraphStore.
      
      * update more APIs.
      
      * rename it to DistGraph.
      
      * rename to DistTensor
      
      * remove some unnecessary API.
      
      * remove unnecessary files.
      
      * revert changes in sampler.
      
      * Revert "simplify gcn."
      
      This reverts commit 0ed3a34ca714203a5b45240af71555d4227ce452.
      
      * Revert "simplify neighbor sampling."
      
      This reverts commit 551c72d20f05a029360ba97f312c7a7a578aacec.
      
      * Revert "measure time in neighbor sampling."
      
      This reverts commit 63ae80c7b402bb626e24acbbc8fdfe9fffd0bc64.
      
      * Revert "add timing in minibatch training of GraphSage."
      
      This reverts commit e59dc8957a414c7df5c316f51d78bce822bdef5e.
      
      * Revert "fix train_sampling."
      
      This reverts commit ea6aea9a4aabb8ba0ff63070aa51e7ca81536ad9.
      
      * fix lint.
      
      * add comments and small update.
      
      * add more comments.
      
      * add more unit tests and fix bugs.
      
      * check the existence of shared-mem graph index.
      
      * use new partitioned graph storage.
      
      * fix bugs.
      
      * print error in fast pull.
      
      * fix lint
      
      * fix a compile error.
      
      * save absolute path after partitioning.
      
      * small fixes in the example
      
      * Revert "[kvstore] support any data type for init_data() (#1465)"
      
      This reverts commit 87b6997b
      
      .
      
      * fix a bug.
      
      * disable evaluation.
      
      * Revert "Revert "[kvstore] support any data type for init_data() (#1465)""
      
      This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee.
      
      * support set and init data.
      
      * support set and init data.
      
      * Revert "Revert "[kvstore] support any data type for init_data() (#1465)""
      
      This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee.
      
      * fix bugs.
      
      * fix unit test.
      
      * move to dgl.distributed.
      
      * fix lint.
      
      * fix lint.
      
      * remove local_nids.
      
      * fix lint.
      
      * fix test.
      
      * remove train_dist.
      
      * revert train_sampling.
      
      * rename funcs.
      
      * address comments.
      
      * address comments.
      
      Use NodeDataView/EdgeDataView to keep track of data.
      
      * address comments.
      
      * address comments.
      
      * revert.
      
      * save data with DGL serializer.
      
      * use the right way of getting shape.
      
      * fix lint.
      
      * address comments.
      
      * address comments.
      
      * fix an error in mxnet.
      
      * address comments.
      
      * add edge_map.
      
      * add more test and fix bugs.
      Co-authored-by: default avatarZheng <dzzhen@186590dc80ff.ant.amazon.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-6-131.us-east-2.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-150.us-west-2.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-30-135.us-west-2.compute.internal>
      2190c39d
  2. 28 Apr, 2020 2 commits
  3. 26 Apr, 2020 1 commit
  4. 18 Mar, 2020 1 commit
  5. 15 Mar, 2020 1 commit
  6. 10 Mar, 2020 1 commit
  7. 07 Mar, 2020 1 commit
    • Quan (Andy) Gan's avatar
      [Model][Sampler] GraphSAGE model, bipartite graph conversion & remove edges API (#1297) · a9520f71
      Quan (Andy) Gan authored
      * remove edge and to bipartite and graphsage with sampling
      
      * fixes
      
      * fixes
      
      * fixes
      
      * reenable multigpu training
      
      * fixes
      
      * compatibility in DGLGraph
      
      * rename to compact_as_bipartite
      
      * bugfix
      
      * lint
      
      * add offline inference
      
      * skip GPU tests
      
      * fix
      
      * addresses comments
      
      * fix
      
      * fix
      
      * fix
      
      * more tests
      
      * more docs and unit tests
      
      * workaround for empty slice on empty data
      a9520f71
  8. 04 Nov, 2019 1 commit
  9. 03 Nov, 2019 1 commit
    • Zihao Ye's avatar
      [NN] nn modules & examples update (#890) · 9a0511c8
      Zihao Ye authored
      * upd
      
      * damn it
      
      * fuck
      
      * fuck pylint
      
      * fudge
      
      * remove some comments about MXNet
      
      * upd
      
      * upd
      
      * damn it
      
      * damn it
      
      * fuck
      
      * fuck
      
      * upd
      
      * upd
      
      * pylint bastard
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      9a0511c8
  10. 30 Oct, 2019 1 commit
    • xiang song(charlie.song)'s avatar
      [Bug Fix] Fix package reliability bug of networkx (#949) · 82499e60
      xiang song(charlie.song) authored
      * upd
      
      * fig edgebatch edges
      
      * add test
      
      * trigger
      
      * Update README.md for pytorch PinSage example.
      
      Add noting that the PinSage model example under
      example/pytorch/recommendation only work with Python 3.6+
      as its dataset loader depends on stanfordnlp package
      which work only with Python 3.6+.
      
      * Provid a frame agnostic API to test nn modules on both CPU and CUDA side.
      
      1. make dgl.nn.xxx frame agnostic
      2. make test.backend include dgl.nn modules
      3. modify test_edge_softmax of test/mxnet/test_nn.py and
          test/pytorch/test_nn.py work on both CPU and GPU
      
      * Fix style
      
      * Delete unused code
      
      * Make agnostic test only related to tests/backend
      
      1. clear all agnostic related code in dgl.nn
      2. make test_graph_conv agnostic to cpu/gpu
      
      * Fix code style
      
      * fix
      
      * doc
      
      * Make all test code under tests.mxnet/pytorch.test_nn.py
      work on both CPU and GPU.
      
      * Fix syntex
      
      * Remove rand
      
      * Add TAGCN nn.module and example
      
      * Now tagcn can run on CPU.
      
      * Add unitest for TGConv
      
      * Fix style
      
      * For pubmed dataset, using --lr=0.005 can achieve better acc
      
      * Fix style
      
      * Fix some descriptions
      
      * trigger
      
      * Fix doc
      
      * Add nn.TGConv and example
      
      * Fix bug
      
      * Update data in mxnet.tagcn test acc.
      
      * Fix some comments and code
      
      * delete useless code
      
      * Fix namming
      
      * Fix bug
      
      * Fix bug
      
      * Add test for mxnet TAGCov
      
      * Add test code for mxnet TAGCov
      
      * Update some docs
      
      * Fix some code
      
      * Update docs dgl.nn.mxnet
      
      * Update weight init
      
      * Fix
      
      * reproduce the bug
      
      * Fix concurrency bug reported at #755.
      Also make test_shared_mem_store.py more deterministic.
      
      * Update test_shared_mem_store.py
      
      * Update dmlc/core
      
      * networkx >= 2.4 will break our examples
      
      * Update tutorials/requirements
      
      * fix selfloop edges
      
      * upd version
      82499e60
  11. 29 Oct, 2019 1 commit
  12. 27 Aug, 2019 2 commits
  13. 23 May, 2019 1 commit
  14. 22 Feb, 2019 1 commit