1. 09 Feb, 2022 1 commit
    • Xin Yao's avatar
      [Feature] CUDA UVA sampling for MultiLayerNeighborSampler (#3674) · 738e8318
      Xin Yao authored
      
      
      * implement pin_memory/unpin_memory/is_pinned for dgl.graph
      
      * update python docstring
      
      * update c++ docstring
      
      * add test
      
      * fix the broken UnifiedTensor
      
      * XPU_SWITCH for kDLCPUPinned
      
      * a rough version ready for testing
      
      * eliminate extra context parameter for pin/unpin
      
      * update train_sampling
      
      * fix linting
      
      * fix typo
      
      * multi-gpu uva sampling case
      
      * disable new format materialization for pinned graphs
      
      * update python doc for pin_memory_
      
      * fix unit test
      
      * UVA sampling for link prediction
      
      * dispatch most csr ops
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update doc
      
      * update examples
      
      * change unitgraph and heterograph's PinMemory to in-place
      
      * update examples for multi-gpu uva sampling
      
      * update doc
      
      * fix linting
      
      * fix cpu build
      
      * fix is_pinned for DistGraph
      
      * fix is_pinned for DistGraph
      
      * update graphsage unsupervised example
      
      * update doc for gpu sampling
      
      * update some check for sampling device switching
      
      * fix linting
      
      * adapt for new dataloader
      
      * fix linting
      
      * fix
      
      * fix some name issue
      
      * adjust device check
      
      * add unit test for uva sampling & fix some zero_copy bug
      
      * fix linting
      
      * update num_threads in graphsage examples
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      738e8318
  2. 30 Jan, 2022 1 commit
    • Quan (Andy) Gan's avatar
      [Sampling] New sampling pipeline plus asynchronous prefetching (#3665) · 701b4fcc
      Quan (Andy) Gan authored
      * initial update
      
      * more
      
      * more
      
      * multi-gpu example
      
      * cluster gcn, finalize homogeneous
      
      * more explanation
      
      * fix
      
      * bunch of fixes
      
      * fix
      
      * RGAT example and more fixes
      
      * shadow-gnn sampler and some changes in unit test
      
      * fix
      
      * wth
      
      * more fixes
      
      * remove shadow+node/edge dataloader tests for possible ux changes
      
      * lints
      
      * add legacy dataloading import just in case
      
      * fix
      
      * update pylint for f-strings
      
      * fix
      
      * lint
      
      * lint
      
      * lint again
      
      * cherry-picking commit fa9f494
      
      * oops
      
      * fix
      
      * add sample_neighbors in dist_graph
      
      * fix
      
      * lint
      
      * fix
      
      * fix
      
      * fix
      
      * fix tutorial
      
      * fix
      
      * fix
      
      * fix
      
      * fix warning
      
      * remove debug
      
      * add get_foo_storage apis
      
      * lint
      701b4fcc
  3. 26 Jan, 2022 1 commit
  4. 10 Aug, 2021 1 commit
  5. 06 Aug, 2021 1 commit
  6. 05 Aug, 2021 1 commit
  7. 28 Jul, 2021 2 commits
  8. 17 Jul, 2021 1 commit
  9. 15 Jul, 2021 1 commit
  10. 05 Jul, 2021 1 commit
  11. 16 Jun, 2021 1 commit
  12. 02 Jun, 2021 1 commit
  13. 26 May, 2021 1 commit
  14. 18 May, 2021 1 commit
  15. 25 Jan, 2021 1 commit
    • Da Zheng's avatar
      [Distributed] Heterogeneous graph support (#2457) · 25ac3344
      Da Zheng authored
      
      
      * Distributed heterograph (#3)
      
      * heterogeneous graph partition.
      
      * fix graph partition book for heterograph.
      
      * load heterograph partitions.
      
      * update DistGraphServer to support heterograph.
      
      * make DistGraph runnable for heterograph.
      
      * partition a graph and store parts with homogeneous graph structure.
      
      * update DistGraph server&client to use homogeneous graph.
      
      * shuffle node Ids based on node types.
      
      * load mag in heterograph.
      
      * fix per-node-type mapping.
      
      * balance node types.
      
      * fix for homogeneous graph
      
      * store etype for now.
      
      * fix data name.
      
      * fix a bug in example.
      
      * add profiler in rgcn.
      
      * heterogeneous RGCN.
      
      * map homogeneous node ids to hetero node ids.
      
      * fix graph partition book.
      
      * fix DistGraph.
      
      * shuffle eids.
      
      * verify eids and their mappings when loading a partition.
      
      * Id map from homogneous Ids to per-type Ids.
      
      * verify partitioned results.
      
      * add test for distributed sampler.
      
      * add mapping from per-type Ids to homogeneous Ids.
      
      * update example.
      
      * fix DistGraph.
      
      * Revert "add profiler in rgcn."
      
      This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676.
      
      * add tests for homogeneous graphs.
      
      * fix a bug.
      
      * fix test.
      
      * fix for one partition.
      
      * fix for standalone training and evaluation.
      
      * small fix.
      
      * fix two bugs.
      
      * initialize projection matrix.
      
      * small fix on RGCN.
      
      * Fix rgcn performance (#17)
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix.
      
      * fix test.
      
      * fix lint.
      
      * test partitions.
      
      * remove redundant test for partitioning.
      
      * remove commented code.
      
      * fix partition.
      
      * fix tests.
      
      * fix RGCN.
      
      * fix test.
      
      * fix test.
      
      * fix test.
      
      * fix.
      
      * fix a bug.
      
      * update dmlc-core.
      
      * fix.
      
      * fix rgcn.
      
      * update readme.
      
      * add comments.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix.
      
      * fix.
      
      * add div_int.
      
      * fix.
      
      * fix.
      
      * fix lint.
      
      * fix.
      
      * fix.
      
      * fix.
      
      * adjust.
      
      * move code.
      
      * handle heterograph.
      
      * return pytorch tensor in GPB.
      
      * remove some tests in example.
      
      * add to_block for distributed training.
      
      * use distributed to_block.
      
      * remove unnecessary function in DistGraph.
      
      * remove distributed to_block.
      
      * use pytorch tensor.
      
      * fix a bug in ntypes and etypes.
      
      * enable norm.
      
      * make the data loader compatible with the old format.
      
      * fix.
      
      * add comments.
      
      * fix a bug.
      
      * add test for heterograph.
      
      * support partition without reshuffle.
      
      * add test.
      
      * support partition without reshuffle.
      
      * fix.
      
      * add test.
      
      * fix bugs.
      
      * fix lint.
      
      * fix dataset.
      
      * fix for mxnet.
      
      * update docstring.
      
      * rename to floor_div
      
      * avoid exposing NodePartitionPolicy and EdgePartitionPolicy.
      
      * fix docstring.
      
      * fix error.
      
      * fixes.
      
      * fix comments.
      
      * rename.
      
      * rename.
      
      * explain IdMap.
      
      * fix docstring.
      
      * fix docstring.
      
      * update docstring.
      
      * remove the code of returning heterograph.
      
      * remove argument.
      
      * fix example.
      
      * make GraphPartitionBook an abstract class.
      
      * fix.
      
      * fix.
      
      * fix a bug.
      
      * fix a bug in example
      
      * fix a bug
      
      * reverse heterograph sampling.
      
      * temp fix.
      
      * fix lint.
      
      * Revert "temp fix."
      
      This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381.
      
      * compute norm.
      
      * Revert "reverse heterograph sampling."
      
      This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9.
      
      * fix.
      
      * move id_map.py
      
      * remove check
      
      * add more comments.
      
      * update docstring.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      25ac3344
  16. 08 Jan, 2021 1 commit
  17. 19 Aug, 2020 1 commit
  18. 18 Aug, 2020 1 commit
  19. 12 Aug, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] adjust various APIs. (#1993) · d1cf5c38
      Da Zheng authored
      * rename get_data_size.
      
      * remove g from DistTensor.
      
      * remove g from DistEmbedding.
      
      * clean up API of graph partition book.
      
      * fix DistGraph
      
      * fix lint.
      
      * collect all part policies.
      
      * fix.
      
      * fix.
      
      * support distributed sampler.
      
      * remove partition.py
      d1cf5c38
  20. 11 Aug, 2020 1 commit
    • Chao Ma's avatar
      [Distributed] Remove server_count from ip_config.txt (#1985) · d340ea3a
      Chao Ma authored
      
      
      * remove server_count from ip_config.txt
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * lint
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update dist_context.py
      
      * fix lint.
      
      * make it work for multiple spaces.
      
      * update ip_config.txt.
      
      * fix examples.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * udpate
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
      d340ea3a
  21. 10 Aug, 2020 1 commit
  22. 08 Aug, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] Add roles (#1971) · 5454471f
      Da Zheng authored
      * distinguish roles.
      
      * add comments.
      
      * fix lint.
      
      * move roles to server_state.
      
      * fix text.
      
      * fix tests.
      
      * fix tests.
      
      * Revert "fix tests."
      
      This reverts commit 5baa136b872a4550d4e612bfb1dfe363d7814adf.
      5454471f
  23. 05 Aug, 2020 1 commit
    • Jinjing Zhou's avatar
      [Distributed] DistDataloader (#1901) · 4f499c7f
      Jinjing Zhou authored
      
      
      * 111
      
      * 111
      
      * fix
      
      * 111
      
      * fix
      
      * 11
      
      * fix
      
      * lint
      
      * Update __init__.py
      
      * lint
      
      * fix
      
      * lint
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * try fix
      
      * try fix
      
      * fix
      
      * Revert "fix"
      
      This reverts commit a0b954fd4e99b7df92b53db8334dcb583d6e1551.
      
      * fixes.
      
      * fix.
      
      * fix test.
      
      * fix exit.
      
      * fix.
      
      * fix
      
      * fix
      
      * lint
      
      * lint
      
      * lint
      
      * fix
      
      * Update .gitignore
      
      * 111
      
      * fix
      
      * 111
      
      * 111
      
      * fff
      
      * 1111
      
      * 111
      
      * 1325315
      
      * ffff
      
      * f???
      
      * fff
      
      * 1111
      
      * 111
      
      * fix
      
      * 111
      
      * asda
      
      * 1111
      
      * 11
      
      * 123
      
      * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊
      
      * spawn
      
      * 1231231
      
      * up
      
      * 111
      
      * fix
      
      * fix
      
      * Revert "fix"
      
      This reverts commit 7373f95312fdcaa36d2fc330bf242339e89c045d.
      
      * fix
      
      * fix
      
      * 1111
      
      * fix
      
      * fix tests
      
      * start kvclient as early as possible.
      
      * lint
      
      * fix test
      
      * lint
      
      * 1111
      
      * fix
      
      * fix
      
      * 111
      
      * fix
      
      * fix
      
      * 1
      
      * fix
      
      * fix
      
      * lint
      
      * fix
      
      * lint
      
      * lint
      
      * remove quit
      
      * fix
      
      * lint
      
      * fix
      
      * fix several
      
      * lint
      
      * fix minor
      
      * fix
      
      * lint
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      4f499c7f
  24. 03 Aug, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] Support multiple servers (#1886) · a4c931a9
      Da Zheng authored
      
      
      * client init graph on the backup servers.
      
      * fix.
      
      * test multi-server.
      
      * fix anonymous dist tensors.
      
      * check #parts.
      
      * fix init_data
      
      * add multi-server multi-client tests.
      
      * update tests in kvstore.
      
      * fix.
      
      * verify the loaded partition.
      
      * fix a bug.
      
      * fix lint.
      
      * fix.
      
      * fix example.
      
      * fix rpc.
      
      * fix pull/push handler for backup kvstore
      
      * fix example readme.
      
      * change ip.
      
      * update docstring.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-1.us-west-2.compute.internal>
      a4c931a9
  25. 01 Aug, 2020 1 commit
  26. 31 Jul, 2020 1 commit
  27. 29 Jul, 2020 1 commit
  28. 28 Jul, 2020 2 commits
  29. 27 Jul, 2020 1 commit
  30. 22 Jul, 2020 1 commit
  31. 20 Jul, 2020 1 commit
    • Chao Ma's avatar
      [RPC] Rpc exit with explicit invocation (#1825) · 5c92f6c2
      Chao Ma authored
      * exit client
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update test
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      5c92f6c2
  32. 15 Jul, 2020 1 commit
  33. 14 Jul, 2020 1 commit
  34. 09 Jul, 2020 1 commit
  35. 06 Jul, 2020 1 commit
  36. 03 Jul, 2020 1 commit
    • Da Zheng's avatar
      [Feature] add sparse embedding. (#1497) · cadcc1c2
      Da Zheng authored
      * add sparse embedding.
      
      * fix
      
      * add test.
      
      * man fixes.
      
      * many fixes
      
      * fix sparse emb.
      
      * fix.
      
      * fix lint.
      
      * fix lint.
      
      * fix kvstore.
      
      * expose DistTensor.
      
      * test sparse embeddings.
      
      * add attach_grad to the backends.
      
      * remove part_id
      
      * fix.
      
      * move backward computation.
      
      * move more computation to backend.
      
      * fix a bug when applying learning rate.
      
      * fix a few things.
      
      * fix a few things.
      
      * add docstring
      
      * fix.
      
      * apply no_grad.
      
      * fix tests.
      
      * fix for other frameworks.
      
      * add examples in docstring.
      cadcc1c2
  37. 01 Jul, 2020 1 commit
  38. 28 Jun, 2020 1 commit
    • Da Zheng's avatar
      [Distributed] Pytorch example of distributed GraphSage. (#1495) · 02d31974
      Da Zheng authored
      
      
      * add train_dist.
      
      * Fix sampling example.
      
      * use distributed sampler.
      
      * fix a bug in DistTensor.
      
      * fix distributed training example.
      
      * add graph partition.
      
      * add command
      
      * disable pytorch parallel.
      
      * shutdown correctly.
      
      * load diff graphs.
      
      * add ip_config.txt.
      
      * record timing for each step.
      
      * use ogb
      
      * add profiler.
      
      * fix a bug.
      
      * add train_dist.
      
      * Fix sampling example.
      
      * use distributed sampler.
      
      * fix a bug in DistTensor.
      
      * fix distributed training example.
      
      * add graph partition.
      
      * add command
      
      * disable pytorch parallel.
      
      * shutdown correctly.
      
      * load diff graphs.
      
      * add ip_config.txt.
      
      * record timing for each step.
      
      * use ogb
      
      * add profiler.
      
      * add Ips of the cluster.
      
      * fix exit.
      
      * support multiple clients.
      
      * balance node types and edges.
      
      * move code.
      
      * remove run.sh
      
      * Revert "support multiple clients."
      
      * fix.
      
      * update train_sampling.
      
      * fix.
      
      * fix
      
      * remove run.sh
      
      * update readme.
      
      * update readme.
      
      * use pytorch distributed.
      
      * ensure all trainers run the same number of steps.
      
      * Update README.md
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal>
      02d31974