1. 06 Sep, 2022 1 commit
    • Chang Liu's avatar
      [Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03
      Chang Liu authored
      
      
      * Use an internal cuda stream for CopyDataFromTo
      
      * small fix white space
      
      * Fix to compile
      
      * Make stream optional in copydata for compile
      
      * fix lint issue
      
      * Update cub functions to use internal stream
      
      * Lint check
      
      * Update CopyTo/CopyFrom/CopyFromTo to use internal stream
      
      * Address comments
      
      * Fix backward CUDA stream
      
      * Avoid overloading CopyFromTo()
      
      * Minor comment update
      
      * Overload copydatafromto in cuda device api
      Co-authored-by: default avatarxiny <xiny@nvidia.com>
      1c9d2a03
  2. 12 Aug, 2022 1 commit
  3. 27 Jun, 2022 1 commit
    • ndickson-nvidia's avatar
      [Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c
      ndickson-nvidia authored
      * * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
      * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
      * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas
      
      * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM
      
      * * Added missing instantiation of DLDataTypeTraits<__half>::dtype
      
      * * Fixed linter error
      * Added clearer comment explaining why the cast to long long is necessary
      
      * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side
      
      * * Fixed linter formatting errors
      
      * * Changes to comments as recommended
      
      * * Made recommended changes to logging errors in FP16 specializations
      * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
      a5d8460c
  4. 06 Jun, 2022 1 commit
    • ndickson-nvidia's avatar
      [Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50
      ndickson-nvidia authored
      * * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
      * Fixed an issue with previous check for FP16 support
      
      * * Removing FP16 type checks, since they should no longer be needed
      
      * * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.
      
      * * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new
      ea44da50
  5. 26 May, 2022 1 commit
  6. 18 Feb, 2022 1 commit
  7. 07 Jan, 2022 1 commit
    • Quan (Andy) Gan's avatar
      [Feature] Negative sampling (#3599) · 90f10b31
      Quan (Andy) Gan authored
      * first commit
      
      * a bunch of fixes
      
      * add unique
      
      * lint
      
      * lint
      
      * lint
      
      * address comments
      
      * Update negative_sampler.py
      
      * fix
      
      * description
      
      * address comments and fix
      
      * fix
      
      * replace unique with replace
      
      * test pylint
      
      * Update negative_sampler.py
      90f10b31
  8. 27 Apr, 2021 1 commit
  9. 25 Mar, 2021 1 commit
  10. 28 Jan, 2021 1 commit
  11. 25 Jan, 2021 1 commit
  12. 10 Sep, 2020 1 commit
  13. 28 Jun, 2020 1 commit
    • Minjie Wang's avatar
      [CUDA][Kernel] More CUDA kernels; Standardize the behavior for sorted COO/CSR (#1704) · 870da747
      Minjie Wang authored
      * add cub; array cumsum
      
      * CSRSliceRows
      
      * fix warning
      
      * operator << for ndarray; CSRSliceRows
      
      * add CSRIsSorted
      
      * add csr_sort
      
      * inplace coosort and outplace csrsort
      
      * WIP: coo is sorted
      
      * mv cuda_utils
      
      * add AllTrue utility
      
      * csr sort
      
      * coo sort
      
      * coo2csr for sorted coo arrays
      
      * CSRToCOO from sorted
      
      * pass tests for the new kernel changes
      
      * cannot use inplace sort
      
      * lint
      
      * try fix msvc error
      
      * Fix g.copy_to and g.asnumbits; ToBlock no longer uses CSC
      
      * stash
      
      * revert some hack
      
      * revert some changes
      
      * address comments
      
      * fix
      
      * fix to_block unittest
      
      * add todo note
      870da747
  14. 22 Jun, 2020 1 commit
    • Zihao Ye's avatar
      [kernel] New SpMM & SDDMM kernel on CPU and CUDA (#1644) · 071cba1f
      Zihao Ye authored
      * udp
      
      * simplify
      
      * sddmm dot cpu
      
      * upd
      
      * format
      
      * upd
      
      * compatible with MJ's PR
      
      * lint
      
      * upd
      
      * upd
      
      * upd
      
      * python end
      
      * upd
      
      * upd
      
      * lint
      
      * lint
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * upd
      
      * lint
      
      * fix mxnet
      
      * upd
      
      * lint
      
      * use minjie's ptr
      
      * macro
      
      * upd
      
      * reorg
      
      * lint
      
      * fix corner cases
      
      * upd
      
      * enrich cpu docs
      
      * upd
      
      * upd
      
      * lint
      
      * lint
      
      * pylint
      
      * sx review
      
      * improve docstring
      
      * python doc
      
      * upd
      
      * restructure
      
      * lint
      
      * upd test
      
      * upd
      
      * pylint
      
      * fix corner cases and test
      071cba1f
  15. 19 Jun, 2020 1 commit
    • Minjie Wang's avatar
      [CUDA] Many CUDA operators; Prepare for DGLGraph on CUDA (#1660) · f1b19a6b
      Minjie Wang authored
      * add cuda utils; change g.to; add g.device
      
      * split array.h into several headers
      
      * cuda index select
      
      * file
      
      * three cuda kernels
      
      * add cuda elementwise arith and several others
      
      * cuda CSRIsNonZero
      
      * fix lint
      
      * lint
      
      * lint
      
      * fix bug in changing ctx to property
      
      * address comments
      
      * remove unused codes
      
      * address comments
      f1b19a6b