- 23 Nov, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 22 Nov, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 14 Jul, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 09 Dec, 2022 1 commit
-
-
Xin Yao authored
* fix empty tensor is treated as pinned * avoid calling cudaHostGetDevicePointer on nullptr * update empty array * add a comment
-
- 07 Nov, 2022 3 commits
-
-
Hongzhi (Steve), Chen authored
* blabla * more * blabla * blabla * ablabla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * blabla * ablabla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* replace * blabla * balbla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 06 Nov, 2022 2 commits
-
-
Hongzhi (Steve), Chen authored
* param * brief * note * return * tparam * brief2 * file * return2 * return * blabla * all Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Xin Yao authored
* add bf16 specializations * remove SWITCH_BITS * enable amp for bf16 * remove SWITCH_BITS for cpu kernels * enbale bf16 based on CUDART * fix compiling for sm<80 * fix cpu build * enable unit tests * update doc * disable test for CUDA < 11.0 * address comments * address comments
-
- 28 Oct, 2022 1 commit
-
-
Quan (Andy) Gan authored
* sample neighbors with masks * oops * refactor again * remove * remove debug code * rename macro * address comments * address comment * address comments * rename a lot of stuff * oops
-
- 19 Sep, 2022 1 commit
-
-
Xin Yao authored
* rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments
-
- 15 Sep, 2022 1 commit
-
-
Xin Yao authored
* add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle
-
- 06 Sep, 2022 1 commit
-
-
Chang Liu authored
* Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by:xiny <xiny@nvidia.com>
-
- 12 Aug, 2022 1 commit
-
-
Xin Yao authored
* Change CUDA_MAX_NUM_THREADS to 256 * change the configuration of grid
-
- 27 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU` * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half` * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM * * Added missing instantiation of DLDataTypeTraits<__half>::dtype * * Fixed linter error * Added clearer comment explaining why the cast to long long is necessary * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side * * Fixed linter formatting errors * * Changes to comments as recommended * * Made recommended changes to logging errors in FP16 specializations * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
-
- 06 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures * Fixed an issue with previous check for FP16 support * * Removing FP16 type checks, since they should no longer be needed * * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures. Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs. * * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new
-
- 26 May, 2022 1 commit
-
-
nv-dlasalle authored
* Enable FP16 for GPU builds in CI * Limit default GPU archs to pascal and above * Disable FP16 dispatching for cuda architectures less than 60 * Fix linting * Fix typos
-
- 18 Feb, 2022 1 commit
-
-
ayasar70 authored
* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment * fixing lint issues * Update cub for cuda 11.5 compatibility (#3468) * fixing type mismatch * tx guaranteed to be smaller than nnz. Hence removing last check * minor: updating comment * adding three unit tests for csr slice method to cover some corner cases * timing repeatkernel * clean * clean * clean * updating _SegmentMaskColKernel * Working on requests: removing sorted array check and adding comments to utility functions * fixing lint issue Co-authored-by:
Abdurrahman Yasar <ayasar@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 07 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* first commit * a bunch of fixes * add unique * lint * lint * lint * address comments * Update negative_sampler.py * fix * description * address comments and fix * fix * replace unique with replace * test pylint * Update negative_sampler.py
-
- 27 Apr, 2021 1 commit
-
-
Israt Nisa authored
* init cuda support * cuSPARSE err * passed unittest for csr_mm/SpGEMM. int64 not supported * Debugging cuSPARSE error 3 * csrgeam only supports int32? * disabling int64 for cuda * refactor and add CSRMask * lint * oops * remove todo * rewrite CSRMask with CSRGetData * lint * fix test * address comments * lint * fix * addresses comments and rename BUG_ON Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-30-71.ec2.internal> Co-authored-by:
Quan Gan <coin2028@hotmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 25 Mar, 2021 1 commit
-
-
Quan (Andy) Gan authored
* disable cpu fp16 * spell mistakes
-
- 28 Jan, 2021 1 commit
-
-
Zihao Ye authored
* add tvm as submodule * compilation is ok but calling fails * can call now * pack multiple modules, change names * upd * upd * upd * fix cmake * upd * upd * upd * upd * fix * relative path * upd * upd * upd * singleton * upd * trigger * fix * upd * count reducible * upd * upd * upd * upd * upd * upd * upd * upd * upd * only keep related files * upd * upd * upd * upd * lint * lint * lint * lint * pylint * upd * upd * compilation * fix * upd * upd * upd * upd * upd * upd * upd doc * refactor * fix * upd number Co-authored-by:
Zhi Lin <linzhilynn@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-42-78.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-21-156.us-east-2.compute.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 25 Jan, 2021 1 commit
-
-
Zihao Ye authored
* upd * upd * upd * upd * fix * upd * upd Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 10 Sep, 2020 1 commit
-
-
Zihao Ye authored
* upd * upd * upd * upd * upd * upd * upd * upd * upd * upd * upd * fix * upd * upd * upd * upd * fix * upd Co-authored-by:VoVAllen <jz1749@nyu.edu>
-
- 28 Jun, 2020 1 commit
-
-
Minjie Wang authored
* add cub; array cumsum * CSRSliceRows * fix warning * operator << for ndarray; CSRSliceRows * add CSRIsSorted * add csr_sort * inplace coosort and outplace csrsort * WIP: coo is sorted * mv cuda_utils * add AllTrue utility * csr sort * coo sort * coo2csr for sorted coo arrays * CSRToCOO from sorted * pass tests for the new kernel changes * cannot use inplace sort * lint * try fix msvc error * Fix g.copy_to and g.asnumbits; ToBlock no longer uses CSC * stash * revert some hack * revert some changes * address comments * fix * fix to_block unittest * add todo note
-
- 22 Jun, 2020 1 commit
-
-
Zihao Ye authored
* udp * simplify * sddmm dot cpu * upd * format * upd * compatible with MJ's PR * lint * upd * upd * upd * python end * upd * upd * lint * lint * upd * upd * upd * upd * upd * lint * fix mxnet * upd * lint * use minjie's ptr * macro * upd * reorg * lint * fix corner cases * upd * enrich cpu docs * upd * upd * lint * lint * pylint * sx review * improve docstring * python doc * upd * restructure * lint * upd test * upd * pylint * fix corner cases and test
-
- 19 Jun, 2020 1 commit
-
-
Minjie Wang authored
* add cuda utils; change g.to; add g.device * split array.h into several headers * cuda index select * file * three cuda kernels * add cuda elementwise arith and several others * cuda CSRIsNonZero * fix lint * lint * lint * fix bug in changing ctx to property * address comments * remove unused codes * address comments
-