Commits · 1947d87dd77eabe5893e277a52ecf0f9eb2f1063 · OpenDAS / dgl

23 Aug, 2022 1 commit
- fix unpinning when tensoradaptor is not available (#4450) · 1947d87d
  Xin Yao authored Aug 23, 2022
  
  1947d87d
18 Aug, 2022 1 commit

[Feature] Rework Dataloader cpu affinitization as helper method (#4126) · 47993776

Daniil Sizov authored Aug 18, 2022



* Add helper method for temporary affinitization of compute threads

* Rework DL affinitization as single helper

* Add example usage in benchmarks

* Fix python linter warnings

* Fix affinity helper params

* Use NUMA node 0 cores only by default

* Fix benchmarks

* Fix lint errors
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

47993776

15 Aug, 2022 1 commit
- [Bugfix] Fix pinning empty tensors and graphs (#4393) · 3685000a
  Xin Yao authored Aug 15, 2022
  
  3685000a
12 Aug, 2022 1 commit
- [Performance] Improve the performance of SpMMCsr by reconfiguration (#4363) · 2523bc7a
  Xin Yao authored Aug 12, 2022
```
* Change CUDA_MAX_NUM_THREADS to 256

* change the configuration of grid
```
  2523bc7a
09 Aug, 2022 1 commit
- [Bug] Fix broken static_assert (#4342) · 182e1ad5
  Xin Yao authored Aug 09, 2022
  
  182e1ad5
01 Aug, 2022 1 commit

[Feature] Enable UVA for Weighted Samplers (#4314) · 44b68641

Xin Yao authored Aug 01, 2022

* enable use for weighted neighbor sampler and biased random walk

* add unit tests

* fix for mxnet/tf

* fix typo

44b68641

29 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e

Xin Yao authored Jul 29, 2022



* add weighted sampling without replacement (A-Chao)

* improve Algorithm A-Chao with block-wise prefix sum

* correctly fill out_idxs

* implement weighted sampling with replacement

* small fix

* merge host-side code of weighted/uniform sampling

* enable unit tests for cuda weighted sampling

* move thrust/cub wrapper to the cmake file

* update docs accordingly

* fix linting

* fix linting

* fix unit test

* Bump external CUB/Thrust versions

* Fix code style and update description of algorithm design

* [Feature] GPU support weighted graph neighbor sampling
commit by pengqirong(OPPO)

* merge pengqirong's implementation

* revert the change to cub and thrust

* fix linting

* use DeviceSegmentedSort for better performance

* add more comments

* add necessary notes

* add necessary notes

* resolve some comments

* define THRUST_CUB_WRAPPED_NAMESPACE

* fix doc
Co-authored-by: 彭齐荣 <657017034@qq.com>

86c81b4e

27 Jul, 2022 1 commit
- [Log] fix confusing error log in TCPSocket::Bind() (#4299) · 069068aa
  Rhett Ying authored Jul 27, 2022
```
* [Log] fix confusing error log in TCPSocket::Bind()

* fix lint
```
  069068aa
26 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Randomwalk Sampling (#4243) · 7e6a6b4a

Dewvin authored Jul 26, 2022



* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* fix empty prob array && enable non-uniform for restart && enable unit tests

* update doc and guide for randomwalk and pinsage

* update comments
Co-authored-by: zhenliangqiu <ubuntu@ip-172-31-24-245.ap-southeast-1.compute.internal>
Co-authored-by: xiny <xiny@nvidia.com>

7e6a6b4a

15 Jul, 2022 1 commit
- decompose (#4259) · 9a7ad16e
  Quan (Andy) Gan authored Jul 15, 2022
  
  9a7ad16e
09 Jul, 2022 1 commit
- [Bugfix] Add CUDA context availability check before setting curand seed (#4223) · 1feec870
  Xin Yao authored Jul 09, 2022
  
  1feec870
07 Jul, 2022 1 commit
- [Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available (#4199) · 9ee7ced5
  Xin Yao authored Jul 07, 2022
  
  9ee7ced5
01 Jul, 2022 2 commits
- [BugFix] check whether etype sorted when sampling (#4198) · dcf16992
  Rhett Ying authored Jul 01, 2022
  
  dcf16992
- [Feature] extend sort_csr/csc_by_tag to edge (#4164) · 6a6597a0
  Rhett Ying authored Jul 01, 2022
```
* [Feature] extend sort_csr/csc_by_tag to edge

* fix test ffailure in tensorflow

* refine sorting by edges

* fix docstring

* remove unnecessary mem
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
  6a6597a0
29 Jun, 2022 1 commit

[bugfix] Allow communicators of size one when NCCL is missing (#3713) · 1dddaad4

nv-dlasalle authored Jun 28, 2022



* Update nccl communicator for when NCCL is missing

* Use static_cast

* Add doc string

* Fix whitespace

* Resrtict unit test to GPU runs
Co-authored-by: Xin Yao <xiny@nvidia.com>

1dddaad4

27 Jun, 2022 2 commits

[Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c

ndickson-nvidia authored Jun 27, 2022

* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas

* * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM

* * Added missing instantiation of DLDataTypeTraits<__half>::dtype

* * Fixed linter error
* Added clearer comment explaining why the cast to long long is necessary

* * Worked around a compile error in some particular setup, where __half can't be constructed on the host side

* * Fixed linter formatting errors

* * Changes to comments as recommended

* * Made recommended changes to logging errors in FP16 specializations
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)

a5d8460c

[BugFix] fix rpc-related build issue on mac OS (#4168) · 10db5d0b
Rhett Ying authored Jun 27, 2022
```
* [BugFix] fix rpc-related build issue on mac OS

* add warning message

* add warning message
```
10db5d0b

24 Jun, 2022 1 commit

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249

nv-dlasalle authored Jun 23, 2022



* Add uva by default to embedding

* More updates

* Update optimizer

* Add new uva functions

* Expose new pinned memory function

* Add unit tests

* Update formatting

* Fix unit test

* Handle auto UVA case when training is on CPU

* Allow per-embedding decisions for whether to use UVA

* Address spares_optim.py comments

* Remove unused templates

* Update unit test

* Use dgl allocate memory for pinning

* allow automatically unpin

* workaround for d2h copy with a different dtype

* fix linting

* update error message

* update copyright
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

020f0249

23 Jun, 2022 2 commits

[Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) (#4135) · 077e002f

Xin Yao authored Jun 23, 2022



* Explicitly unpin tensoradapter allocated arrays

* Undo unrelated change

* Add unit test

* update unit test

* add pinned_by_dgl flag to NDArray::Container

* use dgl.ndarray for holding the pinning status

* update multi-gpu uva inference

* reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor

* update unpin column and examples

* add unit test for unpin column
Co-authored-by: Dominique LaSalle <dlasalle@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

077e002f

[Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879

Triston authored Jun 22, 2022



* Fix a cub compile error for CUDA 11.5

* Fix comparison of integer expressions of different signedness in coo_sort.cu file

* Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file

* Remove never referenced variable in spmm.cu

* Fix comparison of integer expressions of different signedness in rowwise_pick.h file

* Fix comparison of integer expressions of different signedness in choice.cc file

* Remove never referenced variable col_data in spat_op_impl_coo.cc

* Remove never referenced variable allowed in global_uniform.cc

* Fix comparison of integer expressions of different signedness in graph.cc

* Fix comparison of integer expressions of different signedness in graph_apis.cc

* Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build

* Fix comparison of integer expressions of different signedness in libra_partition.cc

* Fix comparison of integer expressions of different signedness in graph_op.cc
Co-authored-by: Triston Cao <tristonc@nvidia.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

1ad65879

20 Jun, 2022 1 commit
- [Dist] re-try to receive rpc ndarray msg (#4142) · 3ffe0c09
  Rhett Ying authored Jun 20, 2022
  
  3ffe0c09
14 Jun, 2022 1 commit
- [Bugfix] Disable non-atomic atomic operations (#4117) · 473bf15f
  nv-dlasalle authored Jun 14, 2022
```
* Disable non-atomic atomic operations

* Improve error message

* Make error message more friendly
```
  473bf15f
11 Jun, 2022 1 commit

[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083) · 60b1c992

Xin Yao authored Jun 11, 2022



* Wrap all CUDA runtime API/CUB calls with macro

* remove the usage of explicit cudaMalloc in favor of AllocWorkspace

* fix typo
Co-authored-by: Israt Nisa <neesha295@gmail.com>

60b1c992

08 Jun, 2022 1 commit

[Dist] enable time out when fetching msg (#4043) · cac3720b

Rhett Ying authored Jun 08, 2022

* [ist] enable time out when fetching msg

* fix lint error

* minor refinements

* improve minor log

* fix dist test

* fix timeout issue in tensorpipe

cac3720b

07 Jun, 2022 1 commit

[Bug][Feature] Added cublasGemm<__half> specialization (#3988) (#4029) · eabcc58e

ndickson-nvidia authored Jun 07, 2022

* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988



* * Added USE_FP16 guard

* * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm

* * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended
Co-authored-by: Xin Yao <xiny@nvidia.com>

eabcc58e

06 Jun, 2022 3 commits

[Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50

ndickson-nvidia authored Jun 06, 2022

* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
* Fixed an issue with previous check for FP16 support

* * Removing FP16 type checks, since they should no longer be needed

* * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.

* * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new

ea44da50

parallelize csr2coo (#4081) · 31a81438
Quan (Andy) Gan authored Jun 06, 2022
```
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
31a81438

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d

28 May, 2022 3 commits
- Change warning message for tensoradapter when not found (#4055) · 9922f41f
  Quan (Andy) Gan authored May 29, 2022
```
* change warning message

* Update tensordispatch.cc
```
  9922f41f
- Revert "[bugfix] Explicitly unpin tensoradapter allocated arrays (#3997)" (#4061) · 00c09b9f
  Quan (Andy) Gan authored May 28, 2022
```
This reverts commit fdd1fe19.
```
  00c09b9f
- add sanity check (#4050) · c577dc9f
  Quan (Andy) Gan authored May 28, 2022
  
  c577dc9f
26 May, 2022 1 commit

[Build][Tests] Enable FP16 for GPU builds in CI (#4030) · 7a065a9c

nv-dlasalle authored May 26, 2022

* Enable FP16 for GPU builds in CI

* Limit default GPU archs to pascal and above

* Disable FP16 dispatching for cuda architectures less than 60

* Fix linting

* Fix typos

7a065a9c

25 May, 2022 1 commit
- [Bugfix] Cython CAPI holding GIL causes deadlock when Python callback is asynchronous (#4036) · 3c129ad7
  Minjie Wang authored May 25, 2022
```
* cython nogil

* move APIs to internal and add unit test

* fix lint

* disable callback array test
```
  3c129ad7
17 May, 2022 1 commit

change the curandState and launch dimension of CSRRowwiseSample kernel (#3990) · bacf2ab4

paoxiaode authored May 17, 2022



* Change the curand_init parameter

* Change the curand_init parameter

* commit

* commit

* change the curandState and launch dim of CSRRowwiseSample kernel

* commit

* keep  _CSRRowWiseSampleReplaceKernel in sync
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

bacf2ab4

16 May, 2022 2 commits
- [bugfix] Explicitly unpin tensoradapter allocated arrays (#3997) · fdd1fe19
  nv-dlasalle authored May 16, 2022
```
* Explicitly unpin tensoradapter allocated arrays

* Undo unrelated change

* Add unit test

* update unit test
```
  fdd1fe19
- [Peformance] Remove unnecessary induced vertices in EdgeSubgraph (#3978) · 03024f95
  Xin Yao authored May 16, 2022
```
* remove unnecessary induced vertices in EdgeSubgraph

* add unit test
```
  03024f95
12 May, 2022 1 commit
- Fix launch parameters index select kernel in sparse push (#3524) · 4177f729
  nv-dlasalle authored May 12, 2022
  
  4177f729
11 May, 2022 1 commit

[Dist] Enable maximum try times for socket backend via DGL_DIST_MAX_T… (#3977) · 22e218d3

Rhett Ying authored May 11, 2022

* [Dist] Enable maximum try times for socket backend via DGL_DIST_MAX_TRY_TIMES

* reset env before/after test

* print log for info when trying to connect

* fix

* print log in python instead of cpp

22e218d3

27 Apr, 2022 1 commit

[Feature] enable socket net_type for rpc (#3951) · 37be02a4

Rhett Ying authored Apr 28, 2022

* [Feature] enable socket net_type for rpc

* fix lint

* fix lint

* fix build issue on windows

* fix test failure on windows

* fix test failure

* fix cpp unit test failure

* net_type blocking max_try_times

* fix other comments

* fix lint

* fix comment

* fix lint

* fix cpp

37be02a4

26 Apr, 2022 1 commit

[Performance][GPU] Improving Disjoint Union kernel for Graph Dataloaders (#3895) · 6e46bbf5

ayasar70 authored Apr 26, 2022



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* timing repeatkernel

* clean

* clean

* clean

* updating _SegmentMaskColKernel

* Working on requests: removing sorted array check and adding comments to utility functions

* fixing lint issue

* Optimizing disjoint union kernel

* Trying to resolve compilation issue on CI

* [EMPTY] Relevant commit message here

* applying revision requests on cpu/disjoint_union.cc

* removing unnecessary casts

* remove extra space
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

6e46bbf5