1. 28 May, 2022 2 commits
  2. 26 May, 2022 1 commit
  3. 25 May, 2022 1 commit
  4. 17 May, 2022 1 commit
  5. 16 May, 2022 2 commits
  6. 12 May, 2022 1 commit
  7. 11 May, 2022 1 commit
  8. 27 Apr, 2022 1 commit
    • Rhett Ying's avatar
      [Feature] enable socket net_type for rpc (#3951) · 37be02a4
      Rhett Ying authored
      * [Feature] enable socket net_type for rpc
      
      * fix lint
      
      * fix lint
      
      * fix build issue on windows
      
      * fix test failure on windows
      
      * fix test failure
      
      * fix cpp unit test failure
      
      * net_type blocking max_try_times
      
      * fix other comments
      
      * fix lint
      
      * fix comment
      
      * fix lint
      
      * fix cpp
      37be02a4
  9. 26 Apr, 2022 1 commit
  10. 12 Apr, 2022 1 commit
  11. 11 Apr, 2022 1 commit
  12. 09 Apr, 2022 1 commit
  13. 05 Apr, 2022 1 commit
    • nv-dlasalle's avatar
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for... · 27a6eb56
      nv-dlasalle authored
      
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827)
      
      * Update graphsage multi-gpu example to use mutliple GPUs for validation and
      testing.
      
      * Remove argmax
      
      * Fix rebase error
      
      * Add more documentation to example and simplify
      
      * Switch to name shared memory
      
      * Add comment about how training is distributed
      
      * Restore iteration count
      
      * fix munmap error reporting for better error messages
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      27a6eb56
  14. 31 Mar, 2022 1 commit
  15. 27 Mar, 2022 1 commit
    • Cheng Wan's avatar
      [Feature] METIS Partition with Communication Volume Minimization (#3821) · fbbca994
      Cheng Wan authored
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * fix OpenMP compatibility issues
      
      * typo
      
      * partition
      
      * misc
      
      * fix typo
      
      * num_parts=1
      
      * import torch
      
      * long
      
      * print info
      
      * print info
      
      * print info
      
      * upd
      
      * remove debug code
      
      * revert partition.py
      
      * fix cut count
      
      * fix cut count
      
      * Revert "fix cut count"
      
      This reverts commit 10926b4fd48f45c8f1ddb58be7db6c22e653effd.
      
      * Revert "fix cut count"
      
      This reverts commit 76465283bef093a2b4209ad70dd15d2437b2ec8a.
      
      * type of deprecate
      
      * typo in deprecate info
      
      * fix typo
      
      * use cv for partitioning
      
      * CE
      
      * no message
      
      * revert
      
      * typo
      
      * add objtype
      
      * no message
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * ?
      
      * semicolon
      
      * drop tensors
      
      * no message
      
      * backward
      
      * backward
      
      * max op
      
      * store X.shape
      
      * th
      
      * test
      
      * Revert "test"
      
      This reverts commit 92b3b2f64a3a1128590098fa03ce429c5466e6ce.
      
      * test
      
      * tolist
      
      * debug
      
      * to cuda
      
      * tuple
      
      * fix bug
      
      * remove X
      
      * no message
      
      * fix bug
      
      * workload balance
      
      * Revert "workload balance"
      
      This reverts commit d7f8e4a16ba2a7eabb4a9bb945523bfe6623e723.
      
      * reverse
      
      * Revert "reverse"
      
      This reverts commit 8a71cf25685aa7d889b9b8881b46f7a16b7d6e6d.
      
      * Revert "Revert "reverse""
      
      This reverts commit 196b143932d5cf9813576ece7c990b63d322d063.
      
      * Revert "Revert "Revert "reverse"""
      
      This reverts commit cf9e89a07013582056e7cde235e51331aca7fa9c.
      
      * no message
      
      * Merge commit '5498cf05'
      
      # Conflicts:
      #	python/dgl/distributed/partition.py
      
      * Revert "Merge commit '5498cf05
      
      '"
      
      This reverts commit f79be2ad777897c7025b28308454cad81ad6bb27.
      
      * fix bug
      
      * third party
      
      * no message
      
      * try to avoid memory leak
      
      * try to avoid memory leak
      
      * avoid memory leak with no hope
      
      * Revert "avoid memory leak with no hope"
      
      This reverts commit c77befe9479f46758e744642f66dd209b50eef7d.
      
      * no message
      
      * Revert "no message"
      
      This reverts commit 478cb28fe25fb1002b2f1dc202bb9bdaad8b2a56.
      
      * del
      
      * Revert "del"
      
      This reverts commit 1b468e45ce646b400ff3ffa61a0b2da058b3bdfd.
      
      * no message
      
      * no message
      
      * Revert "no message"
      
      This reverts commit 92e4f5561ed42da0606618b2fff9f1ad5ed439d9.
      
      * third party
      
      * document
      
      * Update metis_partition.cc
      
      * Update metis_partition_hetero.cc
      
      * Update metis_partition_hetero.cc
      
      * Update partition.py
      
      * Update partition.py
      
      * Update partition.py
      Co-authored-by: default avataryzh119 <expye@outlook.com>
      Co-authored-by: default avatarchwan-rice <54331508+chwan-rice@users.noreply.github.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      fbbca994
  16. 24 Mar, 2022 2 commits
  17. 10 Mar, 2022 1 commit
  18. 01 Mar, 2022 1 commit
  19. 28 Feb, 2022 2 commits
  20. 27 Feb, 2022 1 commit
  21. 23 Feb, 2022 2 commits
    • sanchit-misra's avatar
      e7ad4c9c
    • Minjie Wang's avatar
      [NN] Rework RelGraphConv and HGTConv (#3742) · 0227ddfb
      Minjie Wang authored
      * WIP: TypedLinear and new RelGraphConv
      
      * wip
      
      * further simplify RGCN
      
      * a bunch of tweak for performance; add basic cpu support
      
      * update on segmm
      
      * wip: segment.cu
      
      * new backward kernel works
      
      * fix a bunch of bugs in kernel; leave idx_a for future
      
      * add nn test for typed_linear
      
      * rgcn nn test
      
      * bugfix in corner case; update RGCN README
      
      * doc
      
      * fix cpp lint
      
      * fix lint
      
      * fix ut
      
      * wip: hgtconv; presorted flag for rgcn
      
      * hgt code and ut; WIP: some fix on reorder graph
      
      * better typed linear init
      
      * fix ut
      
      * fix lint; add docstring
      0227ddfb
  22. 21 Feb, 2022 1 commit
    • Quan (Andy) Gan's avatar
      [Bugfix] Bug fixes in new dataloader (#3727) · 3f138eba
      Quan (Andy) Gan authored
      
      
      * fixes
      
      * fix
      
      * more fixes
      
      * update
      
      * oops
      
      * lint?
      
      * temporarily revert - will fix in another PR
      
      * more fixes
      
      * skipping mxnet test
      
      * address comments
      
      * fix DDP
      
      * fix edge dataloader exclusion problems
      
      * stupid bug
      
      * fix
      
      * use_uvm option
      
      * fix
      
      * fixes
      
      * fixes
      
      * fixes
      
      * fixes
      
      * add evaluation for cluster gcn and ddp
      
      * stupid bug again
      
      * fixes
      
      * move sanity checks to only support DGLGraphs
      
      * pytorch lightning compatibility fixes
      
      * remove
      
      * poke
      
      * more fixes
      
      * fix
      
      * fix
      
      * disable test
      
      * docstrings
      
      * why is it getting a memory leak?
      
      * fix
      
      * update
      
      * updates and temporarily disable forkingpickler
      
      * update
      
      * fix?
      
      * fix?
      
      * oops
      
      * oops
      
      * fix
      
      * lint
      
      * huh
      
      * uh
      
      * update
      
      * fix
      
      * made it memory efficient
      
      * refine exclude interface
      
      * fix tutorial
      
      * fix tutorial
      
      * fix graph duplication in CPU dataloader workers
      
      * lint
      
      * lint
      
      * Revert "lint"
      
      This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9.
      
      * Revert "lint"
      
      This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2.
      
      * Revert "fix graph duplication in CPU dataloader workers"
      
      This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f.
      Co-authored-by: default avatarxiny <xiny@nvidia.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      3f138eba
  23. 18 Feb, 2022 2 commits
  24. 15 Feb, 2022 1 commit
    • Israt Nisa's avatar
      [Feature] Gather mm (#3641) · b3d3a2c4
      Israt Nisa authored
      
      
      * init
      
      * init
      
      * working cublasGemm
      
      * benchmark high-mem/low-mem, err gather_mm output
      
      * cuda kernel for bmm like kernel
      
      * removed cpu copy for E_per_Rel
      
      * benchmark code from Minjie
      
      * fixed cublas results in gathermm sorted
      
      * use GPU shared mem in unsorted gather mm
      
      * minor
      
      * Added an optimal version of gather_mm_unsorted
      
      * lint
      
      * init gather_mm_scatter
      
      * cublas transpose added
      
      * fixed h_offset for multiple rel
      
      * backward unittest
      
      * cublas support to transpose W
      
      * adding missed file
      
      * forgot to add header file
      
      * lint
      
      * lint
      
      * cleanup
      
      * lint
      
      * docstring
      
      * lint
      
      * added unittest
      
      * lint
      
      * lint
      
      * unittest
      
      * changed err type
      
      * skip cpu test
      
      * skip CPU code
      
      * move in-len loop inside
      
      * lint
      
      * added check different dim length for B
      
      * w_per_len is optional now
      
      * moved gather_mm to pytorch/backend with backward support
      
      * removed a_/b_trans support
      
      * transpose op inside GEMM call
      
      * removed out alloc from API, changed W 2D to 3D
      
      * Added se_gather_mm, Separate API for sortedE
      
      * Fixed gather_mm (unsorted) user interface
      
      * unsorted gmm backward + separate CAPI for un/sorted A
      
      * typecast to float to support atomicAdd
      
      * lint typecast
      
      * lint
      
      * added gather_mm_scatter
      
      * minor
      
      * const
      
      * design changes
      
      * Added idx_a, idx_b support gmm_scatter
      
      * dgl doc
      
      * lint
      
      * adding gather_mm in ops
      
      * lint
      
      * lint
      
      * minor
      
      * removed benchmark files
      
      * minor
      
      * empty commit
      Co-authored-by: default avatarIsrat Nisa <nisisrat@amazon.com>
      b3d3a2c4
  25. 11 Feb, 2022 1 commit
    • ranzhejiang's avatar
      New fused edge_softmax op (#3650) · bc8f8b0b
      ranzhejiang authored
      
      
      * [feature] edge softmax refact.
      
      * delete file
      
      * fix backward and cmake version
      
      * fix backward
      
      * format function
      
      * fix setting
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * refix
      
      * add cuda kernel for backward and rename some function
      
      * add benchmark for edge_softmax
      
      * fix format
      
      * remove cuda_backwrd
      
      * fix code format and add comment for op on CPU
      
      * fix lint
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      bc8f8b0b
  26. 09 Feb, 2022 1 commit
    • Xin Yao's avatar
      [Feature] CUDA UVA sampling for MultiLayerNeighborSampler (#3674) · 738e8318
      Xin Yao authored
      
      
      * implement pin_memory/unpin_memory/is_pinned for dgl.graph
      
      * update python docstring
      
      * update c++ docstring
      
      * add test
      
      * fix the broken UnifiedTensor
      
      * XPU_SWITCH for kDLCPUPinned
      
      * a rough version ready for testing
      
      * eliminate extra context parameter for pin/unpin
      
      * update train_sampling
      
      * fix linting
      
      * fix typo
      
      * multi-gpu uva sampling case
      
      * disable new format materialization for pinned graphs
      
      * update python doc for pin_memory_
      
      * fix unit test
      
      * UVA sampling for link prediction
      
      * dispatch most csr ops
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update doc
      
      * update examples
      
      * change unitgraph and heterograph's PinMemory to in-place
      
      * update examples for multi-gpu uva sampling
      
      * update doc
      
      * fix linting
      
      * fix cpu build
      
      * fix is_pinned for DistGraph
      
      * fix is_pinned for DistGraph
      
      * update graphsage unsupervised example
      
      * update doc for gpu sampling
      
      * update some check for sampling device switching
      
      * fix linting
      
      * adapt for new dataloader
      
      * fix linting
      
      * fix
      
      * fix some name issue
      
      * adjust device check
      
      * add unit test for uva sampling & fix some zero_copy bug
      
      * fix linting
      
      * update num_threads in graphsage examples
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      738e8318
  27. 26 Jan, 2022 1 commit
  28. 21 Jan, 2022 1 commit
    • Xin Yao's avatar
      [Feature] Pin dgl.graph to the page-locked memory (#3616) · 40b44a43
      Xin Yao authored
      
      
      * implement pin_memory/unpin_memory/is_pinned for dgl.graph
      
      * update python docstring
      
      * update c++ docstring
      
      * add test
      
      * fix the broken UnifiedTensor
      
      * eliminate extra context parameter for pin/unpin
      
      * fix linting
      
      * fix typo
      
      * disable new format materialization for pinned graphs
      
      * update python doc for pin_memory_
      
      * fix unit test
      
      * update doc
      
      * change unitgraph and heterograph's PinMemory to in-place
      
      * update comments for NDArray's PinMemory_ and PinData
      
      * update doc
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      40b44a43
  29. 19 Jan, 2022 1 commit
  30. 17 Jan, 2022 2 commits
  31. 11 Jan, 2022 2 commits
    • MaoYuan Xian's avatar
      Pass the std:min argument's type, to avoid the compilation error. (#3637) · b002f8f9
      MaoYuan Xian authored
      
      
      * Pass the std:min argument's type, to avoid the compilation error.
      
      * Update parallel_for.h
      
      * Update negative_sampling.cc
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      b002f8f9
    • Rhett Ying's avatar
      [Feature][Dist] change TP::Receiver/TP::Sender for multiple connections (#3574) · 37467e25
      Rhett Ying authored
      
      
      * [Feature] enable TP::Receiver wait for any numbers of senders
      
      * fix random unit test failure
      
      * avoid endless future wait
      
      * fix unit test failure
      
      * fix seg fault when finalize wait in receiver
      
      * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests
      
      * fix lint
      
      * release RPCContext resources before process exits
      
      * [Debug] TPReceiver wait start log
      
      * [Debug] add log in get port
      
      * [Debug] add log
      
      * [ReDebug] revert time sleep in unit tests
      
      * [Debug] remove sleep for test_distri,test_mp
      
      * [debug] add more log
      
      * [debug] add listen_booted_ flag
      
      * [debug] restore commented code for queue
      
      * [debug] sleep more in rpc_client
      
      * restore change in tests
      
      * Revert "restore change in tests"
      
      This reverts commit 41a18926d181ec2517069389bfc41de2cc949280.
      
      * Revert "[debug] sleep more in rpc_client"
      
      This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67.
      
      * Revert "[debug] restore commented code for queue"
      
      This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301.
      
      * Revert "[debug] add listen_booted_ flag"
      
      This reverts commit 244b2167d94942ff2a0acec8823b974975e52580.
      
      * Revert "[debug] add more log"
      
      This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2.
      
      * Revert "[Debug] remove sleep for test_distri,test_mp"
      
      This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612.
      
      * remove debug code
      
      * revert unnecessary change
      
      * revert unnecessary changes
      
      * always reset RPCContext when get started and reset all data
      
      * remove time.sleep in dist tests
      
      * fix lint
      
      * reset envs before each dist test
      
      * reset env properly
      
      * add time sleep when start each server
      
      * sleep for a while when boot server
      
      * replace wait_thread with callback
      
      * fix lint
      
      * add dglconnect handshake check
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      37467e25
  32. 10 Jan, 2022 1 commit