1. 28 Oct, 2022 1 commit
  2. 19 Oct, 2022 1 commit
  3. 13 Oct, 2022 2 commits
  4. 11 Oct, 2022 1 commit
  5. 21 Sep, 2022 1 commit
  6. 19 Sep, 2022 2 commits
  7. 15 Sep, 2022 1 commit
    • Xin Yao's avatar
      [Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19
      Xin Yao authored
      * add set_stream
      
      * add .record_stream for NDArray and HeteroGraph
      
      * refactor dgl stream Python APIs
      
      * test record_stream
      
      * add unit test for record stream
      
      * use pytorch's stream
      
      * fix lint
      
      * fix cpu build
      
      * address comments
      
      * address comments
      
      * add record stream tests for dgl.graph
      
      * record frames and update dataloder
      
      * add docstring
      
      * update frame
      
      * add backend check for record_stream
      
      * remove CUDAThreadEntry::stream
      
      * record stream for newly created formats
      
      * fix bug
      
      * fix cpp test
      
      * fix None c_void_p to c_handle
      9a00cf19
  8. 06 Sep, 2022 1 commit
    • Chang Liu's avatar
      [Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03
      Chang Liu authored
      
      
      * Use an internal cuda stream for CopyDataFromTo
      
      * small fix white space
      
      * Fix to compile
      
      * Make stream optional in copydata for compile
      
      * fix lint issue
      
      * Update cub functions to use internal stream
      
      * Lint check
      
      * Update CopyTo/CopyFrom/CopyFromTo to use internal stream
      
      * Address comments
      
      * Fix backward CUDA stream
      
      * Avoid overloading CopyFromTo()
      
      * Minor comment update
      
      * Overload copydatafromto in cuda device api
      Co-authored-by: default avatarxiny <xiny@nvidia.com>
      1c9d2a03
  9. 05 Sep, 2022 2 commits
  10. 31 Aug, 2022 1 commit
    • Xin Yao's avatar
      [Feature] Make TensorAdapter Stream Aware (#4472) · 2b766740
      Xin Yao authored
      * Allocate tensors in DGL's current stream
      
      * make tensoradaptor stream-aware
      
      * replace TAemtpy with cpu allocator
      
      * fix typo
      
      * try fix cpu allocation
      
      * clean header
      
      * redirect AllocDataSpace as well
      
      * resolve comments
      2b766740
  11. 23 Aug, 2022 1 commit
  12. 18 Aug, 2022 1 commit
  13. 15 Aug, 2022 1 commit
  14. 12 Aug, 2022 1 commit
  15. 09 Aug, 2022 1 commit
  16. 01 Aug, 2022 1 commit
  17. 29 Jul, 2022 1 commit
    • Xin Yao's avatar
      [Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e
      Xin Yao authored
      
      
      * add weighted sampling without replacement (A-Chao)
      
      * improve Algorithm A-Chao with block-wise prefix sum
      
      * correctly fill out_idxs
      
      * implement weighted sampling with replacement
      
      * small fix
      
      * merge host-side code of weighted/uniform sampling
      
      * enable unit tests for cuda weighted sampling
      
      * move thrust/cub wrapper to the cmake file
      
      * update docs accordingly
      
      * fix linting
      
      * fix linting
      
      * fix unit test
      
      * Bump external CUB/Thrust versions
      
      * Fix code style and update description of algorithm design
      
      * [Feature] GPU support weighted graph neighbor sampling
      commit by pengqirong(OPPO)
      
      * merge pengqirong's implementation
      
      * revert the change to cub and thrust
      
      * fix linting
      
      * use DeviceSegmentedSort for better performance
      
      * add more comments
      
      * add necessary notes
      
      * add necessary notes
      
      * resolve some comments
      
      * define THRUST_CUB_WRAPPED_NAMESPACE
      
      * fix doc
      Co-authored-by: default avatar彭齐荣 <657017034@qq.com>
      86c81b4e
  18. 27 Jul, 2022 1 commit
  19. 26 Jul, 2022 1 commit
  20. 15 Jul, 2022 1 commit
  21. 09 Jul, 2022 1 commit
  22. 07 Jul, 2022 1 commit
  23. 01 Jul, 2022 2 commits
  24. 29 Jun, 2022 1 commit
  25. 27 Jun, 2022 2 commits
    • ndickson-nvidia's avatar
      [Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c
      ndickson-nvidia authored
      * * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
      * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
      * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas
      
      * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM
      
      * * Added missing instantiation of DLDataTypeTraits<__half>::dtype
      
      * * Fixed linter error
      * Added clearer comment explaining why the cast to long long is necessary
      
      * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side
      
      * * Fixed linter formatting errors
      
      * * Changes to comments as recommended
      
      * * Made recommended changes to logging errors in FP16 specializations
      * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
      a5d8460c
    • Rhett Ying's avatar
      [BugFix] fix rpc-related build issue on mac OS (#4168) · 10db5d0b
      Rhett Ying authored
      * [BugFix] fix rpc-related build issue on mac OS
      
      * add warning message
      
      * add warning message
      10db5d0b
  26. 24 Jun, 2022 1 commit
    • nv-dlasalle's avatar
      [Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249
      nv-dlasalle authored
      
      
      * Add uva by default to embedding
      
      * More updates
      
      * Update optimizer
      
      * Add new uva functions
      
      * Expose new pinned memory function
      
      * Add unit tests
      
      * Update formatting
      
      * Fix unit test
      
      * Handle auto UVA case when training is on CPU
      
      * Allow per-embedding decisions for whether to use UVA
      
      * Address spares_optim.py comments
      
      * Remove unused templates
      
      * Update unit test
      
      * Use dgl allocate memory for pinning
      
      * allow automatically unpin
      
      * workaround for d2h copy with a different dtype
      
      * fix linting
      
      * update error message
      
      * update copyright
      Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
      Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
      020f0249
  27. 23 Jun, 2022 2 commits
    • Xin Yao's avatar
      [Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) (#4135) · 077e002f
      Xin Yao authored
      
      
      * Explicitly unpin tensoradapter allocated arrays
      
      * Undo unrelated change
      
      * Add unit test
      
      * update unit test
      
      * add pinned_by_dgl flag to NDArray::Container
      
      * use dgl.ndarray for holding the pinning status
      
      * update multi-gpu uva inference
      
      * reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor
      
      * update unpin column and examples
      
      * add unit test for unpin column
      Co-authored-by: default avatarDominique LaSalle <dlasalle@nvidia.com>
      Co-authored-by: default avatarnv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
      077e002f
    • Triston's avatar
      [Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879
      Triston authored
      
      
      * Fix a cub compile error for CUDA 11.5
      
      * Fix comparison of integer expressions of different signedness in coo_sort.cu file
      
      * Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file
      
      * Remove never referenced variable in spmm.cu
      
      * Fix comparison of integer expressions of different signedness in rowwise_pick.h file
      
      * Fix comparison of integer expressions of different signedness in choice.cc file
      
      * Remove never referenced variable col_data in spat_op_impl_coo.cc
      
      * Remove never referenced variable allowed in global_uniform.cc
      
      * Fix comparison of integer expressions of different signedness in graph.cc
      
      * Fix comparison of integer expressions of different signedness in graph_apis.cc
      
      * Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build
      
      * Fix comparison of integer expressions of different signedness in libra_partition.cc
      
      * Fix comparison of integer expressions of different signedness in graph_op.cc
      Co-authored-by: default avatarTriston Cao <tristonc@nvidia.com>
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      1ad65879
  28. 20 Jun, 2022 1 commit
  29. 14 Jun, 2022 1 commit
  30. 11 Jun, 2022 1 commit
  31. 08 Jun, 2022 1 commit
  32. 07 Jun, 2022 1 commit
  33. 06 Jun, 2022 2 commits
    • ndickson-nvidia's avatar
      [Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50
      ndickson-nvidia authored
      * * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
      * Fixed an issue with previous check for FP16 support
      
      * * Removing FP16 type checks, since they should no longer be needed
      
      * * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.
      
      * * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new
      ea44da50
    • Quan (Andy) Gan's avatar
      parallelize csr2coo (#4081) · 31a81438
      Quan (Andy) Gan authored
      
      Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
      31a81438