1. 27 Jul, 2022 1 commit
  2. 26 Jul, 2022 1 commit
  3. 15 Jul, 2022 1 commit
  4. 09 Jul, 2022 1 commit
  5. 07 Jul, 2022 1 commit
  6. 01 Jul, 2022 2 commits
  7. 29 Jun, 2022 1 commit
  8. 27 Jun, 2022 2 commits
    • ndickson-nvidia's avatar
      [Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c
      ndickson-nvidia authored
      * * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
      * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
      * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas
      
      * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM
      
      * * Added missing instantiation of DLDataTypeTraits<__half>::dtype
      
      * * Fixed linter error
      * Added clearer comment explaining why the cast to long long is necessary
      
      * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side
      
      * * Fixed linter formatting errors
      
      * * Changes to comments as recommended
      
      * * Made recommended changes to logging errors in FP16 specializations
      * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
      a5d8460c
    • Rhett Ying's avatar
      [BugFix] fix rpc-related build issue on mac OS (#4168) · 10db5d0b
      Rhett Ying authored
      * [BugFix] fix rpc-related build issue on mac OS
      
      * add warning message
      
      * add warning message
      10db5d0b
  9. 24 Jun, 2022 1 commit
    • nv-dlasalle's avatar
      [Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249
      nv-dlasalle authored
      
      
      * Add uva by default to embedding
      
      * More updates
      
      * Update optimizer
      
      * Add new uva functions
      
      * Expose new pinned memory function
      
      * Add unit tests
      
      * Update formatting
      
      * Fix unit test
      
      * Handle auto UVA case when training is on CPU
      
      * Allow per-embedding decisions for whether to use UVA
      
      * Address spares_optim.py comments
      
      * Remove unused templates
      
      * Update unit test
      
      * Use dgl allocate memory for pinning
      
      * allow automatically unpin
      
      * workaround for d2h copy with a different dtype
      
      * fix linting
      
      * update error message
      
      * update copyright
      Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
      Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
      020f0249
  10. 23 Jun, 2022 2 commits
    • Xin Yao's avatar
      [Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) (#4135) · 077e002f
      Xin Yao authored
      
      
      * Explicitly unpin tensoradapter allocated arrays
      
      * Undo unrelated change
      
      * Add unit test
      
      * update unit test
      
      * add pinned_by_dgl flag to NDArray::Container
      
      * use dgl.ndarray for holding the pinning status
      
      * update multi-gpu uva inference
      
      * reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor
      
      * update unpin column and examples
      
      * add unit test for unpin column
      Co-authored-by: default avatarDominique LaSalle <dlasalle@nvidia.com>
      Co-authored-by: default avatarnv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
      077e002f
    • Triston's avatar
      [Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879
      Triston authored
      
      
      * Fix a cub compile error for CUDA 11.5
      
      * Fix comparison of integer expressions of different signedness in coo_sort.cu file
      
      * Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file
      
      * Remove never referenced variable in spmm.cu
      
      * Fix comparison of integer expressions of different signedness in rowwise_pick.h file
      
      * Fix comparison of integer expressions of different signedness in choice.cc file
      
      * Remove never referenced variable col_data in spat_op_impl_coo.cc
      
      * Remove never referenced variable allowed in global_uniform.cc
      
      * Fix comparison of integer expressions of different signedness in graph.cc
      
      * Fix comparison of integer expressions of different signedness in graph_apis.cc
      
      * Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build
      
      * Fix comparison of integer expressions of different signedness in libra_partition.cc
      
      * Fix comparison of integer expressions of different signedness in graph_op.cc
      Co-authored-by: default avatarTriston Cao <tristonc@nvidia.com>
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      1ad65879
  11. 20 Jun, 2022 1 commit
  12. 14 Jun, 2022 1 commit
  13. 11 Jun, 2022 1 commit
  14. 08 Jun, 2022 1 commit
  15. 07 Jun, 2022 1 commit
  16. 06 Jun, 2022 3 commits
  17. 28 May, 2022 3 commits
  18. 26 May, 2022 1 commit
  19. 25 May, 2022 1 commit
  20. 17 May, 2022 1 commit
  21. 16 May, 2022 2 commits
  22. 12 May, 2022 1 commit
  23. 11 May, 2022 1 commit
  24. 27 Apr, 2022 1 commit
    • Rhett Ying's avatar
      [Feature] enable socket net_type for rpc (#3951) · 37be02a4
      Rhett Ying authored
      * [Feature] enable socket net_type for rpc
      
      * fix lint
      
      * fix lint
      
      * fix build issue on windows
      
      * fix test failure on windows
      
      * fix test failure
      
      * fix cpp unit test failure
      
      * net_type blocking max_try_times
      
      * fix other comments
      
      * fix lint
      
      * fix comment
      
      * fix lint
      
      * fix cpp
      37be02a4
  25. 26 Apr, 2022 1 commit
  26. 12 Apr, 2022 1 commit
  27. 11 Apr, 2022 1 commit
  28. 09 Apr, 2022 1 commit
  29. 05 Apr, 2022 1 commit
    • nv-dlasalle's avatar
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for... · 27a6eb56
      nv-dlasalle authored
      
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827)
      
      * Update graphsage multi-gpu example to use mutliple GPUs for validation and
      testing.
      
      * Remove argmax
      
      * Fix rebase error
      
      * Add more documentation to example and simplify
      
      * Switch to name shared memory
      
      * Add comment about how training is distributed
      
      * Restore iteration count
      
      * fix munmap error reporting for better error messages
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      27a6eb56
  30. 31 Mar, 2022 1 commit
  31. 27 Mar, 2022 1 commit
    • Cheng Wan's avatar
      [Feature] METIS Partition with Communication Volume Minimization (#3821) · fbbca994
      Cheng Wan authored
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * fix OpenMP compatibility issues
      
      * typo
      
      * partition
      
      * misc
      
      * fix typo
      
      * num_parts=1
      
      * import torch
      
      * long
      
      * print info
      
      * print info
      
      * print info
      
      * upd
      
      * remove debug code
      
      * revert partition.py
      
      * fix cut count
      
      * fix cut count
      
      * Revert "fix cut count"
      
      This reverts commit 10926b4fd48f45c8f1ddb58be7db6c22e653effd.
      
      * Revert "fix cut count"
      
      This reverts commit 76465283bef093a2b4209ad70dd15d2437b2ec8a.
      
      * type of deprecate
      
      * typo in deprecate info
      
      * fix typo
      
      * use cv for partitioning
      
      * CE
      
      * no message
      
      * revert
      
      * typo
      
      * add objtype
      
      * no message
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * ?
      
      * semicolon
      
      * drop tensors
      
      * no message
      
      * backward
      
      * backward
      
      * max op
      
      * store X.shape
      
      * th
      
      * test
      
      * Revert "test"
      
      This reverts commit 92b3b2f64a3a1128590098fa03ce429c5466e6ce.
      
      * test
      
      * tolist
      
      * debug
      
      * to cuda
      
      * tuple
      
      * fix bug
      
      * remove X
      
      * no message
      
      * fix bug
      
      * workload balance
      
      * Revert "workload balance"
      
      This reverts commit d7f8e4a16ba2a7eabb4a9bb945523bfe6623e723.
      
      * reverse
      
      * Revert "reverse"
      
      This reverts commit 8a71cf25685aa7d889b9b8881b46f7a16b7d6e6d.
      
      * Revert "Revert "reverse""
      
      This reverts commit 196b143932d5cf9813576ece7c990b63d322d063.
      
      * Revert "Revert "Revert "reverse"""
      
      This reverts commit cf9e89a07013582056e7cde235e51331aca7fa9c.
      
      * no message
      
      * Merge commit '5498cf05'
      
      # Conflicts:
      #	python/dgl/distributed/partition.py
      
      * Revert "Merge commit '5498cf05
      
      '"
      
      This reverts commit f79be2ad777897c7025b28308454cad81ad6bb27.
      
      * fix bug
      
      * third party
      
      * no message
      
      * try to avoid memory leak
      
      * try to avoid memory leak
      
      * avoid memory leak with no hope
      
      * Revert "avoid memory leak with no hope"
      
      This reverts commit c77befe9479f46758e744642f66dd209b50eef7d.
      
      * no message
      
      * Revert "no message"
      
      This reverts commit 478cb28fe25fb1002b2f1dc202bb9bdaad8b2a56.
      
      * del
      
      * Revert "del"
      
      This reverts commit 1b468e45ce646b400ff3ffa61a0b2da058b3bdfd.
      
      * no message
      
      * no message
      
      * Revert "no message"
      
      This reverts commit 92e4f5561ed42da0606618b2fff9f1ad5ed439d9.
      
      * third party
      
      * document
      
      * Update metis_partition.cc
      
      * Update metis_partition_hetero.cc
      
      * Update metis_partition_hetero.cc
      
      * Update partition.py
      
      * Update partition.py
      
      * Update partition.py
      Co-authored-by: default avataryzh119 <expye@outlook.com>
      Co-authored-by: default avatarchwan-rice <54331508+chwan-rice@users.noreply.github.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      fbbca994
  32. 24 Mar, 2022 1 commit