Commits · bd3fe59eb8748a7e9bb5b01137eb701bdc36504f · OpenDAS / dgl

11 Oct, 2022 1 commit

[Misc] ClangFormat auto fix. (#4685) · bd3fe59e

Hongzhi (Steve), Chen authored Oct 11, 2022



* Auto fix c++.

* reformat
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bd3fe59e

21 Sep, 2022 1 commit
- [Fix] Enable lint check for cuh files and fix compiler warnings (#4585) · 880b3b1f
  Xin Yao authored Sep 21, 2022
```
* disable warning for tensorpipe

* fix warning

* enable lint check for cuh files

* resolve comments
```
  880b3b1f
19 Sep, 2022 2 commits

[Performance][bugfix] Implement `is_unibipartite` in C++ with caching. (#4556) · 2630d2eb
nv-dlasalle authored Sep 19, 2022
```
* updates

* Enable caching C++ result

* Add missing docstring

* Remove unused function

* Add unit test

* Address comments
```
2630d2eb

[Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) · cded5b80

Xin Yao authored Sep 19, 2022

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments

cded5b80

15 Sep, 2022 1 commit

[Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19

Xin Yao authored Sep 15, 2022

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

9a00cf19

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

05 Sep, 2022 2 commits

[Bug] Enable turn on/off libxsmm at runtime (#4455) · 62af41c2

peizhou001 authored Sep 05, 2022



* enable turn on/off libxsmm at runtime by adding a global config and related API
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>

62af41c2

[Cleanup] Remove async_transferer (#4505) · 806def67

nv-dlasalle authored Sep 04, 2022



* Remove async_transferer

* remove test

* Remove AsyncTransferer
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>

806def67

31 Aug, 2022 1 commit

[Feature] Make TensorAdapter Stream Aware (#4472) · 2b766740

Xin Yao authored Aug 31, 2022

* Allocate tensors in DGL's current stream

* make tensoradaptor stream-aware

* replace TAemtpy with cpu allocator

* fix typo

* try fix cpu allocation

* clean header

* redirect AllocDataSpace as well

* resolve comments

2b766740

23 Aug, 2022 1 commit
- fix unpinning when tensoradaptor is not available (#4450) · 1947d87d
  Xin Yao authored Aug 23, 2022
  
  1947d87d
18 Aug, 2022 1 commit

[Feature] Rework Dataloader cpu affinitization as helper method (#4126) · 47993776

Daniil Sizov authored Aug 18, 2022



* Add helper method for temporary affinitization of compute threads

* Rework DL affinitization as single helper

* Add example usage in benchmarks

* Fix python linter warnings

* Fix affinity helper params

* Use NUMA node 0 cores only by default

* Fix benchmarks

* Fix lint errors
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

47993776

15 Aug, 2022 1 commit
- [Bugfix] Fix pinning empty tensors and graphs (#4393) · 3685000a
  Xin Yao authored Aug 15, 2022
  
  3685000a
12 Aug, 2022 1 commit
- [Performance] Improve the performance of SpMMCsr by reconfiguration (#4363) · 2523bc7a
  Xin Yao authored Aug 12, 2022
```
* Change CUDA_MAX_NUM_THREADS to 256

* change the configuration of grid
```
  2523bc7a
09 Aug, 2022 1 commit
- [Bug] Fix broken static_assert (#4342) · 182e1ad5
  Xin Yao authored Aug 09, 2022
  
  182e1ad5
01 Aug, 2022 1 commit

[Feature] Enable UVA for Weighted Samplers (#4314) · 44b68641

Xin Yao authored Aug 01, 2022

* enable use for weighted neighbor sampler and biased random walk

* add unit tests

* fix for mxnet/tf

* fix typo

44b68641

29 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e

Xin Yao authored Jul 29, 2022



* add weighted sampling without replacement (A-Chao)

* improve Algorithm A-Chao with block-wise prefix sum

* correctly fill out_idxs

* implement weighted sampling with replacement

* small fix

* merge host-side code of weighted/uniform sampling

* enable unit tests for cuda weighted sampling

* move thrust/cub wrapper to the cmake file

* update docs accordingly

* fix linting

* fix linting

* fix unit test

* Bump external CUB/Thrust versions

* Fix code style and update description of algorithm design

* [Feature] GPU support weighted graph neighbor sampling
commit by pengqirong(OPPO)

* merge pengqirong's implementation

* revert the change to cub and thrust

* fix linting

* use DeviceSegmentedSort for better performance

* add more comments

* add necessary notes

* add necessary notes

* resolve some comments

* define THRUST_CUB_WRAPPED_NAMESPACE

* fix doc
Co-authored-by: 彭齐荣 <657017034@qq.com>

86c81b4e

27 Jul, 2022 1 commit
- [Log] fix confusing error log in TCPSocket::Bind() (#4299) · 069068aa
  Rhett Ying authored Jul 27, 2022
```
* [Log] fix confusing error log in TCPSocket::Bind()

* fix lint
```
  069068aa
26 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Randomwalk Sampling (#4243) · 7e6a6b4a

Dewvin authored Jul 26, 2022



* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* fix empty prob array && enable non-uniform for restart && enable unit tests

* update doc and guide for randomwalk and pinsage

* update comments
Co-authored-by: zhenliangqiu <ubuntu@ip-172-31-24-245.ap-southeast-1.compute.internal>
Co-authored-by: xiny <xiny@nvidia.com>

7e6a6b4a

15 Jul, 2022 1 commit
- decompose (#4259) · 9a7ad16e
  Quan (Andy) Gan authored Jul 15, 2022
  
  9a7ad16e
09 Jul, 2022 1 commit
- [Bugfix] Add CUDA context availability check before setting curand seed (#4223) · 1feec870
  Xin Yao authored Jul 09, 2022
  
  1feec870
07 Jul, 2022 1 commit
- [Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available (#4199) · 9ee7ced5
  Xin Yao authored Jul 07, 2022
  
  9ee7ced5
01 Jul, 2022 2 commits
- [BugFix] check whether etype sorted when sampling (#4198) · dcf16992
  Rhett Ying authored Jul 01, 2022
  
  dcf16992
- [Feature] extend sort_csr/csc_by_tag to edge (#4164) · 6a6597a0
  Rhett Ying authored Jul 01, 2022
```
* [Feature] extend sort_csr/csc_by_tag to edge

* fix test ffailure in tensorflow

* refine sorting by edges

* fix docstring

* remove unnecessary mem
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
  6a6597a0
29 Jun, 2022 1 commit

[bugfix] Allow communicators of size one when NCCL is missing (#3713) · 1dddaad4

nv-dlasalle authored Jun 28, 2022



* Update nccl communicator for when NCCL is missing

* Use static_cast

* Add doc string

* Fix whitespace

* Resrtict unit test to GPU runs
Co-authored-by: Xin Yao <xiny@nvidia.com>

1dddaad4

27 Jun, 2022 2 commits

[Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c

ndickson-nvidia authored Jun 27, 2022

* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas

* * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM

* * Added missing instantiation of DLDataTypeTraits<__half>::dtype

* * Fixed linter error
* Added clearer comment explaining why the cast to long long is necessary

* * Worked around a compile error in some particular setup, where __half can't be constructed on the host side

* * Fixed linter formatting errors

* * Changes to comments as recommended

* * Made recommended changes to logging errors in FP16 specializations
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)

a5d8460c

[BugFix] fix rpc-related build issue on mac OS (#4168) · 10db5d0b
Rhett Ying authored Jun 27, 2022
```
* [BugFix] fix rpc-related build issue on mac OS

* add warning message

* add warning message
```
10db5d0b

24 Jun, 2022 1 commit

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249

nv-dlasalle authored Jun 23, 2022



* Add uva by default to embedding

* More updates

* Update optimizer

* Add new uva functions

* Expose new pinned memory function

* Add unit tests

* Update formatting

* Fix unit test

* Handle auto UVA case when training is on CPU

* Allow per-embedding decisions for whether to use UVA

* Address spares_optim.py comments

* Remove unused templates

* Update unit test

* Use dgl allocate memory for pinning

* allow automatically unpin

* workaround for d2h copy with a different dtype

* fix linting

* update error message

* update copyright
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

020f0249

23 Jun, 2022 2 commits

[Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) (#4135) · 077e002f

Xin Yao authored Jun 23, 2022



* Explicitly unpin tensoradapter allocated arrays

* Undo unrelated change

* Add unit test

* update unit test

* add pinned_by_dgl flag to NDArray::Container

* use dgl.ndarray for holding the pinning status

* update multi-gpu uva inference

* reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor

* update unpin column and examples

* add unit test for unpin column
Co-authored-by: Dominique LaSalle <dlasalle@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

077e002f

[Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879

Triston authored Jun 22, 2022



* Fix a cub compile error for CUDA 11.5

* Fix comparison of integer expressions of different signedness in coo_sort.cu file

* Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file

* Remove never referenced variable in spmm.cu

* Fix comparison of integer expressions of different signedness in rowwise_pick.h file

* Fix comparison of integer expressions of different signedness in choice.cc file

* Remove never referenced variable col_data in spat_op_impl_coo.cc

* Remove never referenced variable allowed in global_uniform.cc

* Fix comparison of integer expressions of different signedness in graph.cc

* Fix comparison of integer expressions of different signedness in graph_apis.cc

* Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build

* Fix comparison of integer expressions of different signedness in libra_partition.cc

* Fix comparison of integer expressions of different signedness in graph_op.cc
Co-authored-by: Triston Cao <tristonc@nvidia.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

1ad65879

20 Jun, 2022 1 commit
- [Dist] re-try to receive rpc ndarray msg (#4142) · 3ffe0c09
  Rhett Ying authored Jun 20, 2022
  
  3ffe0c09
14 Jun, 2022 1 commit
- [Bugfix] Disable non-atomic atomic operations (#4117) · 473bf15f
  nv-dlasalle authored Jun 14, 2022
```
* Disable non-atomic atomic operations

* Improve error message

* Make error message more friendly
```
  473bf15f
11 Jun, 2022 1 commit

[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083) · 60b1c992

Xin Yao authored Jun 11, 2022



* Wrap all CUDA runtime API/CUB calls with macro

* remove the usage of explicit cudaMalloc in favor of AllocWorkspace

* fix typo
Co-authored-by: Israt Nisa <neesha295@gmail.com>

60b1c992

08 Jun, 2022 1 commit

[Dist] enable time out when fetching msg (#4043) · cac3720b

Rhett Ying authored Jun 08, 2022

* [ist] enable time out when fetching msg

* fix lint error

* minor refinements

* improve minor log

* fix dist test

* fix timeout issue in tensorpipe

cac3720b

07 Jun, 2022 1 commit

[Bug][Feature] Added cublasGemm<__half> specialization (#3988) (#4029) · eabcc58e

ndickson-nvidia authored Jun 07, 2022

* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988



* * Added USE_FP16 guard

* * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm

* * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended
Co-authored-by: Xin Yao <xiny@nvidia.com>

eabcc58e

06 Jun, 2022 3 commits

[Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50

ndickson-nvidia authored Jun 06, 2022

* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
* Fixed an issue with previous check for FP16 support

* * Removing FP16 type checks, since they should no longer be needed

* * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.

* * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new

ea44da50

parallelize csr2coo (#4081) · 31a81438
Quan (Andy) Gan authored Jun 06, 2022
```
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
31a81438

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d

28 May, 2022 3 commits
- Change warning message for tensoradapter when not found (#4055) · 9922f41f
  Quan (Andy) Gan authored May 29, 2022
```
* change warning message

* Update tensordispatch.cc
```
  9922f41f
- Revert "[bugfix] Explicitly unpin tensoradapter allocated arrays (#3997)" (#4061) · 00c09b9f
  Quan (Andy) Gan authored May 28, 2022
```
This reverts commit fdd1fe19.
```
  00c09b9f
- add sanity check (#4050) · c577dc9f
  Quan (Andy) Gan authored May 28, 2022
  
  c577dc9f