Commits · 099b173f6f678d576a727ee4ad170599ec466f4e · OpenDAS / dgl

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

12 Aug, 2022 1 commit
- [Performance] Improve the performance of SpMMCsr by reconfiguration (#4363) · 2523bc7a
  Xin Yao authored Aug 12, 2022
```
* Change CUDA_MAX_NUM_THREADS to 256

* change the configuration of grid
```
  2523bc7a
27 Jun, 2022 1 commit

[Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c

ndickson-nvidia authored Jun 27, 2022

* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas

* * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM

* * Added missing instantiation of DLDataTypeTraits<__half>::dtype

* * Fixed linter error
* Added clearer comment explaining why the cast to long long is necessary

* * Worked around a compile error in some particular setup, where __half can't be constructed on the host side

* * Fixed linter formatting errors

* * Changes to comments as recommended

* * Made recommended changes to logging errors in FP16 specializations
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)

a5d8460c

06 Jun, 2022 1 commit

[Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50

ndickson-nvidia authored Jun 06, 2022

* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
* Fixed an issue with previous check for FP16 support

* * Removing FP16 type checks, since they should no longer be needed

* * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.

* * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new

ea44da50

26 May, 2022 1 commit

[Build][Tests] Enable FP16 for GPU builds in CI (#4030) · 7a065a9c

nv-dlasalle authored May 26, 2022

* Enable FP16 for GPU builds in CI

* Limit default GPU archs to pascal and above

* Disable FP16 dispatching for cuda architectures less than 60

* Fix linting

* Fix typos

7a065a9c

18 Feb, 2022 1 commit

[Performance][GPU] Improving _SegmentMaskColKernel (#3745) · 7b9afbfa

ayasar70 authored Feb 18, 2022



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* timing repeatkernel

* clean

* clean

* clean

* updating _SegmentMaskColKernel

* Working on requests: removing sorted array check and adding comments to utility functions

* fixing lint issue
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

7b9afbfa

07 Jan, 2022 1 commit

[Feature] Negative sampling (#3599) · 90f10b31

Quan (Andy) Gan authored Jan 07, 2022

* first commit

* a bunch of fixes

* add unique

* lint

* lint

* lint

* address comments

* Update negative_sampler.py

* fix

* description

* address comments and fix

* fix

* replace unique with replace

* test pylint

* Update negative_sampler.py

90f10b31

27 Apr, 2021 1 commit

[Feature] Add cuda support for Sparse Matrix multiplication, summation and masking (#2782) · ab2bd1f1

Israt Nisa authored Apr 27, 2021



* init cuda support

* cuSPARSE err

* passed unittest for csr_mm/SpGEMM. int64 not supported

* Debugging cuSPARSE error 3

* csrgeam only supports int32?

* disabling int64 for cuda

* refactor and add CSRMask

* lint

* oops

* remove todo

* rewrite CSRMask with CSRGetData

* lint

* fix test

* address comments

* lint

* fix

* addresses comments and rename BUG_ON
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-30-71.ec2.internal>
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

ab2bd1f1

25 Mar, 2021 1 commit
- [Bug] Disable cpu fp16 (#2783) · 0b57ce18
  Quan (Andy) Gan authored Mar 25, 2021
```
* disable cpu fp16

* spell mistakes
```
  0b57ce18
28 Jan, 2021 1 commit

[feature] Supporting half precision floating data type (fp16). (#2552) · 7bab1365

Zihao Ye authored Jan 28, 2021



* add tvm as submodule

* compilation is ok but calling fails

* can call now

* pack multiple modules, change names

* upd

* upd

* upd

* fix cmake

* upd

* upd

* upd

* upd

* fix

* relative path

* upd

* upd

* upd

* singleton

* upd

* trigger

* fix

* upd

* count reducible

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* only keep related files

* upd

* upd

* upd

* upd

* lint

* lint

* lint

* lint

* pylint

* upd

* upd

* compilation

* fix

* upd

* upd

* upd

* upd

* upd

* upd

* upd doc

* refactor

* fix

* upd number
Co-authored-by: Zhi Lin <linzhilynn@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-78.us-east-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-21-156.us-east-2.compute.internal>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

7bab1365

25 Jan, 2021 1 commit
- [feature] Implement missing CUDA operators for COO format (part 1). (#2565) · 0f9056ed
  Zihao Ye authored Jan 25, 2021
```
* upd

* upd

* upd

* upd

* fix

* upd

* upd
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
```
  0f9056ed
10 Sep, 2020 1 commit

[performance] Batch DGLGraph in C++ end. (#2155) · cbd55eb1

Zihao Ye authored Sep 11, 2020



* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* fix

* upd

* upd

* upd

* upd

* fix

* upd
Co-authored-by: VoVAllen <jz1749@nyu.edu>

cbd55eb1

28 Jun, 2020 1 commit

[CUDA][Kernel] More CUDA kernels; Standardize the behavior for sorted COO/CSR (#1704) · 870da747

Minjie Wang authored Jun 28, 2020

* add cub; array cumsum

* CSRSliceRows

* fix warning

* operator << for ndarray; CSRSliceRows

* add CSRIsSorted

* add csr_sort

* inplace coosort and outplace csrsort

* WIP: coo is sorted

* mv cuda_utils

* add AllTrue utility

* csr sort

* coo sort

* coo2csr for sorted coo arrays

* CSRToCOO from sorted

* pass tests for the new kernel changes

* cannot use inplace sort

* lint

* try fix msvc error

* Fix g.copy_to and g.asnumbits; ToBlock no longer uses CSC

* stash

* revert some hack

* revert some changes

* address comments

* fix

* fix to_block unittest

* add todo note

870da747

22 Jun, 2020 1 commit

[kernel] New SpMM & SDDMM kernel on CPU and CUDA (#1644) · 071cba1f

Zihao Ye authored Jun 22, 2020

* udp

* simplify

* sddmm dot cpu

* upd

* format

* upd

* compatible with MJ's PR

* lint

* upd

* upd

* upd

* python end

* upd

* upd

* lint

* lint

* upd

* upd

* upd

* upd

* upd

* lint

* fix mxnet

* upd

* lint

* use minjie's ptr

* macro

* upd

* reorg

* lint

* fix corner cases

* upd

* enrich cpu docs

* upd

* upd

* lint

* lint

* pylint

* sx review

* improve docstring

* python doc

* upd

* restructure

* lint

* upd test

* upd

* pylint

* fix corner cases and test

071cba1f

19 Jun, 2020 1 commit

[CUDA] Many CUDA operators; Prepare for DGLGraph on CUDA (#1660) · f1b19a6b

Minjie Wang authored Jun 19, 2020

* add cuda utils; change g.to; add g.device

* split array.h into several headers

* cuda index select

* file

* three cuda kernels

* add cuda elementwise arith and several others

* cuda CSRIsNonZero

* fix lint

* lint

* lint

* fix bug in changing ctx to property

* address comments

* remove unused codes

* address comments

f1b19a6b