1. 02 Mar, 2022 1 commit
  2. 30 Jan, 2022 2 commits
    • Rhett Ying's avatar
      [Fix] sleep for a while when launching clients which will connect to … (#3704) · 9c8c162a
      Rhett Ying authored
      * [Fix] sleep for a while when launching clients which will connect to multiple servers
      
      * pre-allocate more ports
      
      * no multiple partitions on single machine
      9c8c162a
    • Quan (Andy) Gan's avatar
      [Sampling] New sampling pipeline plus asynchronous prefetching (#3665) · 701b4fcc
      Quan (Andy) Gan authored
      * initial update
      
      * more
      
      * more
      
      * multi-gpu example
      
      * cluster gcn, finalize homogeneous
      
      * more explanation
      
      * fix
      
      * bunch of fixes
      
      * fix
      
      * RGAT example and more fixes
      
      * shadow-gnn sampler and some changes in unit test
      
      * fix
      
      * wth
      
      * more fixes
      
      * remove shadow+node/edge dataloader tests for possible ux changes
      
      * lints
      
      * add legacy dataloading import just in case
      
      * fix
      
      * update pylint for f-strings
      
      * fix
      
      * lint
      
      * lint
      
      * lint again
      
      * cherry-picking commit fa9f494
      
      * oops
      
      * fix
      
      * add sample_neighbors in dist_graph
      
      * fix
      
      * lint
      
      * fix
      
      * fix
      
      * fix
      
      * fix tutorial
      
      * fix
      
      * fix
      
      * fix
      
      * fix warning
      
      * remove debug
      
      * add get_foo_storage apis
      
      * lint
      701b4fcc
  3. 28 Jan, 2022 2 commits
  4. 26 Jan, 2022 1 commit
  5. 19 Jan, 2022 2 commits
  6. 11 Jan, 2022 1 commit
    • Rhett Ying's avatar
      [Feature][Dist] change TP::Receiver/TP::Sender for multiple connections (#3574) · 37467e25
      Rhett Ying authored
      
      
      * [Feature] enable TP::Receiver wait for any numbers of senders
      
      * fix random unit test failure
      
      * avoid endless future wait
      
      * fix unit test failure
      
      * fix seg fault when finalize wait in receiver
      
      * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests
      
      * fix lint
      
      * release RPCContext resources before process exits
      
      * [Debug] TPReceiver wait start log
      
      * [Debug] add log in get port
      
      * [Debug] add log
      
      * [ReDebug] revert time sleep in unit tests
      
      * [Debug] remove sleep for test_distri,test_mp
      
      * [debug] add more log
      
      * [debug] add listen_booted_ flag
      
      * [debug] restore commented code for queue
      
      * [debug] sleep more in rpc_client
      
      * restore change in tests
      
      * Revert "restore change in tests"
      
      This reverts commit 41a18926d181ec2517069389bfc41de2cc949280.
      
      * Revert "[debug] sleep more in rpc_client"
      
      This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67.
      
      * Revert "[debug] restore commented code for queue"
      
      This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301.
      
      * Revert "[debug] add listen_booted_ flag"
      
      This reverts commit 244b2167d94942ff2a0acec8823b974975e52580.
      
      * Revert "[debug] add more log"
      
      This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2.
      
      * Revert "[Debug] remove sleep for test_distri,test_mp"
      
      This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612.
      
      * remove debug code
      
      * revert unnecessary change
      
      * revert unnecessary changes
      
      * always reset RPCContext when get started and reset all data
      
      * remove time.sleep in dist tests
      
      * fix lint
      
      * reset envs before each dist test
      
      * reset env properly
      
      * add time sleep when start each server
      
      * sleep for a while when boot server
      
      * replace wait_thread with callback
      
      * fix lint
      
      * add dglconnect handshake check
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      37467e25
  7. 06 Dec, 2021 2 commits
    • Jinjing Zhou's avatar
      [RPC] Use tensorpipe for rpc communication (#3335) · a3ce780d
      Jinjing Zhou authored
      * doesn't know whether works
      
      * add change
      
      * fix
      
      * fix
      
      * fix
      
      * remove
      
      * revert
      
      * lint
      
      * lint
      
      * fix
      
      * revert
      
      * lint
      
      * fix
      
      * only build rpc on linux
      
      * lint
      
      * lint
      
      * fix build on windows
      
      * fix windows
      
      * remove old test
      
      * fix cmake
      
      * Revert "remove old test"
      
      This reverts commit f1ea75c777c34cdc1f08c0589676ba6aee1feb29.
      
      * fix windows
      
      * fix
      
      * fix
      
      * fix indent
      
      * fix indent
      
      * address comment
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * lint
      
      * fix indent
      
      * fix lint
      
      * add introduction
      
      * fix
      
      * lint
      
      * lint
      
      * add more logs
      
      * fix
      
      * update xbyak for C++14 with gcc5
      
      * Remove channels
      
      * fix
      
      * add test script
      
      * fix
      
      * remove unused file
      
      * fix lint
      
      * add timeout
      a3ce780d
    • Quan (Andy) Gan's avatar
      [Distributed] Edge-type-specific fanouts for heterogeneous graphs (#3558) · eb08ef38
      Quan (Andy) Gan authored
      * first commit
      
      * second commit
      
      * spaghetti unit tests
      
      * rewrite test
      eb08ef38
  8. 12 Oct, 2021 1 commit
  9. 01 Sep, 2021 1 commit
  10. 29 Aug, 2021 1 commit
  11. 06 Aug, 2021 1 commit
  12. 28 Jul, 2021 2 commits
  13. 17 Jul, 2021 1 commit
  14. 13 Jul, 2021 1 commit
  15. 05 Jul, 2021 1 commit
  16. 02 Jul, 2021 2 commits
  17. 25 Jun, 2021 1 commit
  18. 16 Jun, 2021 1 commit
  19. 26 May, 2021 1 commit
  20. 18 May, 2021 1 commit
  21. 03 May, 2021 1 commit
  22. 26 Apr, 2021 1 commit
    • Da Zheng's avatar
      [Distributed] Fix a bug in graph partition. (#2869) · e7046f1e
      Da Zheng authored
      
      
      * update distributed training doc.
      
      * explain data split.
      
      * fix message passing.
      
      * id mapping.
      
      * fix.
      
      * test data reshuffling.
      
      * fix a bug.
      
      * fix test.
      
      * Revert "fix."
      
      This reverts commit 2d025e9e1a5c05c3da9b803a035a788ced59bd77.
      
      * Revert "id mapping."
      
      This reverts commit 2a6a93ceb81fbdff86e6e9e5a58e1ace1e9d9882.
      
      * Revert "fix message passing."
      
      This reverts commit ed8a86bf2b015e5e4f64ba160e81b207ad2a1d65.
      
      * Revert "explain data split."
      
      This reverts commit 4338ddf8a336014cf92d4cb9a1db02b9badc0e55.
      
      * Revert "update distributed training doc."
      
      This reverts commit dda1c35c44536934c19715534f01f832afda6ad2.
      
      * add more tests.
      
      * fix.
      
      * fix.
      
      * fix.
      Co-authored-by: default avatarZheng <dzzhen@3c22fba32af5.ant.amazon.com>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      e7046f1e
  23. 22 Apr, 2021 1 commit
  24. 13 Apr, 2021 1 commit
  25. 01 Apr, 2021 1 commit
  26. 30 Mar, 2021 1 commit
  27. 25 Jan, 2021 1 commit
    • Da Zheng's avatar
      [Distributed] Heterogeneous graph support (#2457) · 25ac3344
      Da Zheng authored
      
      
      * Distributed heterograph (#3)
      
      * heterogeneous graph partition.
      
      * fix graph partition book for heterograph.
      
      * load heterograph partitions.
      
      * update DistGraphServer to support heterograph.
      
      * make DistGraph runnable for heterograph.
      
      * partition a graph and store parts with homogeneous graph structure.
      
      * update DistGraph server&client to use homogeneous graph.
      
      * shuffle node Ids based on node types.
      
      * load mag in heterograph.
      
      * fix per-node-type mapping.
      
      * balance node types.
      
      * fix for homogeneous graph
      
      * store etype for now.
      
      * fix data name.
      
      * fix a bug in example.
      
      * add profiler in rgcn.
      
      * heterogeneous RGCN.
      
      * map homogeneous node ids to hetero node ids.
      
      * fix graph partition book.
      
      * fix DistGraph.
      
      * shuffle eids.
      
      * verify eids and their mappings when loading a partition.
      
      * Id map from homogneous Ids to per-type Ids.
      
      * verify partitioned results.
      
      * add test for distributed sampler.
      
      * add mapping from per-type Ids to homogeneous Ids.
      
      * update example.
      
      * fix DistGraph.
      
      * Revert "add profiler in rgcn."
      
      This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676.
      
      * add tests for homogeneous graphs.
      
      * fix a bug.
      
      * fix test.
      
      * fix for one partition.
      
      * fix for standalone training and evaluation.
      
      * small fix.
      
      * fix two bugs.
      
      * initialize projection matrix.
      
      * small fix on RGCN.
      
      * Fix rgcn performance (#17)
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix.
      
      * fix test.
      
      * fix lint.
      
      * test partitions.
      
      * remove redundant test for partitioning.
      
      * remove commented code.
      
      * fix partition.
      
      * fix tests.
      
      * fix RGCN.
      
      * fix test.
      
      * fix test.
      
      * fix test.
      
      * fix.
      
      * fix a bug.
      
      * update dmlc-core.
      
      * fix.
      
      * fix rgcn.
      
      * update readme.
      
      * add comments.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix.
      
      * fix.
      
      * add div_int.
      
      * fix.
      
      * fix.
      
      * fix lint.
      
      * fix.
      
      * fix.
      
      * fix.
      
      * adjust.
      
      * move code.
      
      * handle heterograph.
      
      * return pytorch tensor in GPB.
      
      * remove some tests in example.
      
      * add to_block for distributed training.
      
      * use distributed to_block.
      
      * remove unnecessary function in DistGraph.
      
      * remove distributed to_block.
      
      * use pytorch tensor.
      
      * fix a bug in ntypes and etypes.
      
      * enable norm.
      
      * make the data loader compatible with the old format.
      
      * fix.
      
      * add comments.
      
      * fix a bug.
      
      * add test for heterograph.
      
      * support partition without reshuffle.
      
      * add test.
      
      * support partition without reshuffle.
      
      * fix.
      
      * add test.
      
      * fix bugs.
      
      * fix lint.
      
      * fix dataset.
      
      * fix for mxnet.
      
      * update docstring.
      
      * rename to floor_div
      
      * avoid exposing NodePartitionPolicy and EdgePartitionPolicy.
      
      * fix docstring.
      
      * fix error.
      
      * fixes.
      
      * fix comments.
      
      * rename.
      
      * rename.
      
      * explain IdMap.
      
      * fix docstring.
      
      * fix docstring.
      
      * update docstring.
      
      * remove the code of returning heterograph.
      
      * remove argument.
      
      * fix example.
      
      * make GraphPartitionBook an abstract class.
      
      * fix.
      
      * fix.
      
      * fix a bug.
      
      * fix a bug in example
      
      * fix a bug
      
      * reverse heterograph sampling.
      
      * temp fix.
      
      * fix lint.
      
      * Revert "temp fix."
      
      This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381.
      
      * compute norm.
      
      * Revert "reverse heterograph sampling."
      
      This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9.
      
      * fix.
      
      * move id_map.py
      
      * remove check
      
      * add more comments.
      
      * update docstring.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      25ac3344
  28. 18 Aug, 2020 1 commit
  29. 17 Aug, 2020 1 commit
    • Mufei Li's avatar
      [Doc/Feature] Refactor, doc update and behavior fix for graphs (#1983) · be444e52
      Mufei Li authored
      
      
      * Update graph
      
      * Fix for dgl.graph
      
      * from_scipy
      
      * Replace canonical_etypes with relations
      
      * from_networkx
      
      * Update for hetero_from_relations
      
      * Roll back the change of canonical_etypes to relations
      
      * heterograph
      
      * bipartite
      
      * Update doc
      
      * Fix lint
      
      * Fix lint
      
      * Fix test cases
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Update
      
      * Fix test
      
      * Fix
      
      * Update
      
      * Use DGLError
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Update
      
      * Fix
      
      * Update
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Update
      
      * Fix
      
      * Update
      
      * Fix
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Fix
      
      * Fix
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * rewrite sanity checks
      
      * delete unnecessary checks
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Fix
      
      * Update
      
      * Update
      
      * Update
      
      * Fix
      
      * Fix
      
      * Fix
      
      * Update
      
      * Fix
      
      * Update
      
      * Fix
      
      * Fix
      
      * Update
      
      * Fix
      
      * Update
      
      * Fix
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
      Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
      be444e52
  30. 14 Aug, 2020 2 commits
  31. 12 Aug, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] adjust various APIs. (#1993) · d1cf5c38
      Da Zheng authored
      * rename get_data_size.
      
      * remove g from DistTensor.
      
      * remove g from DistEmbedding.
      
      * clean up API of graph partition book.
      
      * fix DistGraph
      
      * fix lint.
      
      * collect all part policies.
      
      * fix.
      
      * fix.
      
      * support distributed sampler.
      
      * remove partition.py
      d1cf5c38
  32. 11 Aug, 2020 1 commit
    • Chao Ma's avatar
      [Distributed] Remove server_count from ip_config.txt (#1985) · d340ea3a
      Chao Ma authored
      
      
      * remove server_count from ip_config.txt
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * lint
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update dist_context.py
      
      * fix lint.
      
      * make it work for multiple spaces.
      
      * update ip_config.txt.
      
      * fix examples.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * udpate
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
      d340ea3a
  33. 10 Aug, 2020 1 commit