Commits · 8ac27dad1a20b4228419e64746ae9110416e34ee · OpenDAS / dgl

07 Nov, 2022 2 commits

[Misc] clang-format auto fix. (#4824) · 8ac27dad

Hongzhi (Steve), Chen authored Nov 07, 2022



* [Misc] clang-format auto fix.

* blabla

* ablabla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

8ac27dad

[Misc] Replace /*! with /**. (#4823) · bcd37684

Hongzhi (Steve), Chen authored Nov 07, 2022



* replace

* blabla

* balbla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bcd37684

06 Nov, 2022 2 commits

[Misc] Replace \xxx with @XXX in structured comment. (#4822) · 619d735d

Hongzhi (Steve), Chen authored Nov 07, 2022



* param

* brief

* note

* return

* tparam

* brief2

* file

* return2

* return

* blabla

* all
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

619d735d

[Feature] Add bfloat16 (bf16) support (#4648) · 96297fb8

Xin Yao authored Nov 06, 2022

* add bf16 specializations

* remove SWITCH_BITS

* enable amp for bf16

* remove SWITCH_BITS for cpu kernels

* enbale bf16 based on CUDART

* fix compiling for sm<80

* fix cpu build

* enable unit tests

* update doc

* disable test for CUDA < 11.0

* address comments

* address comments

96297fb8

03 Nov, 2022 2 commits

[Misc] clang-format auto fix. (#4804) · 8ae50c42

Hongzhi (Steve), Chen authored Nov 03, 2022



* [Misc] clang-format auto fix.

* manual

* manual

* manual

* manual

* todo

* fix
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

8ae50c42

[Bugfix] Fix that UVA cannot work on old GPUs (#4781) · 16e771c0
Xin Yao authored Nov 03, 2022
```
* get device pointers

* change if condition to IsPinned
```
16e771c0

28 Oct, 2022 1 commit

[Sampling] Enable sampling with edge masks on homogeneous graph (#4748) · 72781efb

Quan (Andy) Gan authored Oct 28, 2022

* sample neighbors with masks

* oops

* refactor again

* remove

* remove debug code

* rename macro

* address comments

* address comment

* address comments

* rename a lot of stuff

* oops

72781efb

13 Oct, 2022 1 commit

[Deprecation] Dataset Attributes (#4666) · e452179c

Mufei Li authored Oct 13, 2022



* Update from master (#4584)

* [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430)

* Add refactors for multi-gpu and full-graph example

* Fix format

* Update

* Update

* Update

* [Cleanup] Remove async_transferer (#4505)

* Remove async_transferer

* remove test

* Remove AsyncTransferer
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>

* [Cleanup] Remove duplicate entries of CUB submodule   (issue# 4395) (#4499)

* remove third_part/cub

* remove from third_party
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>

* [Bug] Enable turn on/off libxsmm at runtime (#4455)

* enable turn on/off libxsmm at runtime by adding a global config and related API
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>

* [Feature] Unify the cuda stream used in core library (#4480)

* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

* [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389)

* * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph
* Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information

* * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing

* * Added test to ensure dgl.remove_self_loop function correctly updates batch information

* * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph

* * Formatting

* * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph

* * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass
* Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed

* * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments

* * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test

* * Added doc comments for new parameters

* * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph
* Added test cases for more than k coincident points

* * Updated doc comments for output_batch parameters for clarity

* * Linter formatting fixes

* * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu

* * Rewording in doc comments

* * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [CI] only known devs are authorized to trigger CI (#4518)

* [CI] only known devs are authorized to trigger CI

* fix if author is null

* add comments

* [Readability] Auto fix setup.py and update-version.py (#4446)

* Auto fix update-version

* Auto fix setup.py

* Auto fix update-version

* Auto fix setup.py

* [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438)

* Update distributed-preprocessing.rst

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* fix unpinning when tensoradaptor is not available (#4450)

* [Doc] fix print issue in tutorial (#4459)

* [Example][Refactor] Refactor RGCN example (#4327)

* Refactor full graph entity classification

* Refactor rgcn with sampling

* README update

* Update

* Results update

* Respect default setting of self_loop=false in entity.py

* Update

* Update README

* Update for multi-gpu

* Update

* [doc] fix invalid link in user guide (#4468)

* [Example] directional_GSN for ogbg-molpcba (#4405)

* version-1

* version-2

* version-3

* update examples/README

* Update .gitignore

* update performance in README, delete scripts

* 1st approving review

* 2nd approving review
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

* Clarify the message name, which is 'm'. (#4462)
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* [Refactor] Auto fix view.py. (#4461)
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [Example] SEAL for OGBL (#4291)

* [Example] SEAL for OGBL

* update index

* update

* fix readme typo

* add seal sampler

* modify set ops

* prefetch

* efficiency test

* update

* optimize

* fix ScatterAdd dtype issue

* update sampler style

* update
Co-authored-by: Quan Gan <coin2028@hotmail.com>

* [CI] use https instead of http (#4488)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block()

* fix test failure in TF

* [Feature] Make TensorAdapter Stream Aware (#4472)

* Allocate tensors in DGL's current stream

* make tensoradaptor stream-aware

* replace TAemtpy with cpu allocator

* fix typo

* try fix cpu allocation

* clean header

* redirect AllocDataSpace as well

* resolve comments

* [Build][Doc] Specify the sphinx version (#4465)
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* reformat

* reformat

* Auto fix update-version

* Auto fix setup.py

* reformat

* reformat
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>

* Move mock version of dgl_sparse library to DGL main repo (#4524)

* init

* Add api doc for sparse library

* support op btwn matrices with differnt sparsity

* Fixed docstring

* addresses comments

* lint check

* change keyword format to fmt
Co-authored-by: Israt Nisa <nisisrat@amazon.com>

* [DistPart] expose timeout config for process group (#4532)

* [DistPart] expose timeout config for process group

* refine code

* Update tools/distpartitioning/data_proc_pipeline.py
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [Feature] Import PyTorch's CUDA stream management (#4503)

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

* [examples]educe memory consumption (#4558)

* [examples]educe memory consumption

* reffine help message

* refine

* [Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)

* Added cugraph nightly scripts

* Removed nvcr.io//nvidia/pytorch:22.04-py3 reference
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)" (#4563)

This reverts commit ec171c64

.

* [Misc] Add flake8 lint workflow. (#4566)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.

* Add flake8 annotation in workflow.

* remove

* add

* clean up
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Try use official pylint workflow. (#4568)

* polish update_version

* update pylint workflow.

* add

* revert.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [CI] refine stage logic (#4565)

* [CI] refine stage logic

* refine

* refine

* remove (#4570)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Add Pylint workflow for flake8. (#4571)

* remove

* Add pylint.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Update the python version in Pylint workflow for flake8. (#4572)

* remove

* Add pylint.

* Change the python version for pylint.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4574)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Use another workflow. (#4575)

* Update pylint.

* Use another workflow.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4576)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint.yml

* Update pylint.yml

* Delete pylint.yml

* [Misc]Add pyproject.toml for autopep8 & black. (#4543)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454)

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

* [Deprecation] Dataset Attributes (#4546)

* Update

* CI

* CI

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* [Example] Bug Fix (#4665)

* Update

* CI

* CI

* Update

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* Update
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

e452179c

11 Oct, 2022 1 commit

[Misc] ClangFormat auto fix. (#4685) · bd3fe59e

Hongzhi (Steve), Chen authored Oct 11, 2022



* Auto fix c++.

* reformat
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bd3fe59e

21 Sep, 2022 1 commit
- [Fix] Enable lint check for cuh files and fix compiler warnings (#4585) · 880b3b1f
  Xin Yao authored Sep 21, 2022
```
* disable warning for tensorpipe

* fix warning

* enable lint check for cuh files

* resolve comments
```
  880b3b1f
19 Sep, 2022 1 commit

[Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) · cded5b80

Xin Yao authored Sep 19, 2022

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments

cded5b80

15 Sep, 2022 1 commit

[Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19

Xin Yao authored Sep 15, 2022

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

9a00cf19

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

12 Aug, 2022 1 commit
- [Performance] Improve the performance of SpMMCsr by reconfiguration (#4363) · 2523bc7a
  Xin Yao authored Aug 12, 2022
```
* Change CUDA_MAX_NUM_THREADS to 256

* change the configuration of grid
```
  2523bc7a
09 Aug, 2022 1 commit
- [Bug] Fix broken static_assert (#4342) · 182e1ad5
  Xin Yao authored Aug 09, 2022
  
  182e1ad5
29 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e

Xin Yao authored Jul 29, 2022



* add weighted sampling without replacement (A-Chao)

* improve Algorithm A-Chao with block-wise prefix sum

* correctly fill out_idxs

* implement weighted sampling with replacement

* small fix

* merge host-side code of weighted/uniform sampling

* enable unit tests for cuda weighted sampling

* move thrust/cub wrapper to the cmake file

* update docs accordingly

* fix linting

* fix linting

* fix unit test

* Bump external CUB/Thrust versions

* Fix code style and update description of algorithm design

* [Feature] GPU support weighted graph neighbor sampling
commit by pengqirong(OPPO)

* merge pengqirong's implementation

* revert the change to cub and thrust

* fix linting

* use DeviceSegmentedSort for better performance

* add more comments

* add necessary notes

* add necessary notes

* resolve some comments

* define THRUST_CUB_WRAPPED_NAMESPACE

* fix doc
Co-authored-by: 彭齐荣 <657017034@qq.com>

86c81b4e

15 Jul, 2022 1 commit
- decompose (#4259) · 9a7ad16e
  Quan (Andy) Gan authored Jul 15, 2022
  
  9a7ad16e
27 Jun, 2022 1 commit

[Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c

ndickson-nvidia authored Jun 27, 2022

* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas

* * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM

* * Added missing instantiation of DLDataTypeTraits<__half>::dtype

* * Fixed linter error
* Added clearer comment explaining why the cast to long long is necessary

* * Worked around a compile error in some particular setup, where __half can't be constructed on the host side

* * Fixed linter formatting errors

* * Changes to comments as recommended

* * Made recommended changes to logging errors in FP16 specializations
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)

a5d8460c

24 Jun, 2022 1 commit

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249

nv-dlasalle authored Jun 23, 2022



* Add uva by default to embedding

* More updates

* Update optimizer

* Add new uva functions

* Expose new pinned memory function

* Add unit tests

* Update formatting

* Fix unit test

* Handle auto UVA case when training is on CPU

* Allow per-embedding decisions for whether to use UVA

* Address spares_optim.py comments

* Remove unused templates

* Update unit test

* Use dgl allocate memory for pinning

* allow automatically unpin

* workaround for d2h copy with a different dtype

* fix linting

* update error message

* update copyright
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

020f0249

23 Jun, 2022 1 commit

[Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879

Triston authored Jun 22, 2022



* Fix a cub compile error for CUDA 11.5

* Fix comparison of integer expressions of different signedness in coo_sort.cu file

* Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file

* Remove never referenced variable in spmm.cu

* Fix comparison of integer expressions of different signedness in rowwise_pick.h file

* Fix comparison of integer expressions of different signedness in choice.cc file

* Remove never referenced variable col_data in spat_op_impl_coo.cc

* Remove never referenced variable allowed in global_uniform.cc

* Fix comparison of integer expressions of different signedness in graph.cc

* Fix comparison of integer expressions of different signedness in graph_apis.cc

* Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build

* Fix comparison of integer expressions of different signedness in libra_partition.cc

* Fix comparison of integer expressions of different signedness in graph_op.cc
Co-authored-by: Triston Cao <tristonc@nvidia.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

1ad65879

14 Jun, 2022 1 commit
- [Bugfix] Disable non-atomic atomic operations (#4117) · 473bf15f
  nv-dlasalle authored Jun 14, 2022
```
* Disable non-atomic atomic operations

* Improve error message

* Make error message more friendly
```
  473bf15f
11 Jun, 2022 1 commit

[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083) · 60b1c992

Xin Yao authored Jun 11, 2022



* Wrap all CUDA runtime API/CUB calls with macro

* remove the usage of explicit cudaMalloc in favor of AllocWorkspace

* fix typo
Co-authored-by: Israt Nisa <neesha295@gmail.com>

60b1c992

07 Jun, 2022 1 commit

[Bug][Feature] Added cublasGemm<__half> specialization (#3988) (#4029) · eabcc58e

ndickson-nvidia authored Jun 07, 2022

* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988



* * Added USE_FP16 guard

* * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm

* * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended
Co-authored-by: Xin Yao <xiny@nvidia.com>

eabcc58e

06 Jun, 2022 2 commits

[Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50

ndickson-nvidia authored Jun 06, 2022

* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
* Fixed an issue with previous check for FP16 support

* * Removing FP16 type checks, since they should no longer be needed

* * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.

* * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new

ea44da50

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d

26 May, 2022 1 commit

[Build][Tests] Enable FP16 for GPU builds in CI (#4030) · 7a065a9c

nv-dlasalle authored May 26, 2022

* Enable FP16 for GPU builds in CI

* Limit default GPU archs to pascal and above

* Disable FP16 dispatching for cuda architectures less than 60

* Fix linting

* Fix typos

7a065a9c

17 May, 2022 1 commit

change the curandState and launch dimension of CSRRowwiseSample kernel (#3990) · bacf2ab4

paoxiaode authored May 17, 2022



* Change the curand_init parameter

* Change the curand_init parameter

* commit

* commit

* change the curandState and launch dim of CSRRowwiseSample kernel

* commit

* keep  _CSRRowWiseSampleReplaceKernel in sync
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

bacf2ab4

16 May, 2022 1 commit
- [Peformance] Remove unnecessary induced vertices in EdgeSubgraph (#3978) · 03024f95
  Xin Yao authored May 16, 2022
```
* remove unnecessary induced vertices in EdgeSubgraph

* add unit test
```
  03024f95
26 Apr, 2022 1 commit

[Performance][GPU] Improving Disjoint Union kernel for Graph Dataloaders (#3895) · 6e46bbf5

ayasar70 authored Apr 26, 2022



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* timing repeatkernel

* clean

* clean

* clean

* updating _SegmentMaskColKernel

* Working on requests: removing sorted array check and adding comments to utility functions

* fixing lint issue

* Optimizing disjoint union kernel

* Trying to resolve compilation issue on CI

* [EMPTY] Relevant commit message here

* applying revision requests on cpu/disjoint_union.cc

* removing unnecessary casts

* remove extra space
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

6e46bbf5

10 Mar, 2022 1 commit

Change the parameter of curand_init (#3794) · eec219ab

paoxiaode authored Mar 10, 2022



* Change the curand_init parameter

* Change the curand_init parameter

* commit

* commit
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

eec219ab

28 Feb, 2022 1 commit
- [Build] Split spmm.cu and sddmm.cu for building on Windows (#3789) · 3521fbe9
  Quan (Andy) Gan authored Mar 01, 2022
```
* split files

* fix
```
  3521fbe9
23 Feb, 2022 1 commit

[NN] Rework RelGraphConv and HGTConv (#3742) · 0227ddfb

Minjie Wang authored Feb 23, 2022

* WIP: TypedLinear and new RelGraphConv

* wip

* further simplify RGCN

* a bunch of tweak for performance; add basic cpu support

* update on segmm

* wip: segment.cu

* new backward kernel works

* fix a bunch of bugs in kernel; leave idx_a for future

* add nn test for typed_linear

* rgcn nn test

* bugfix in corner case; update RGCN README

* doc

* fix cpp lint

* fix lint

* fix ut

* wip: hgtconv; presorted flag for rgcn

* hgt code and ut; WIP: some fix on reorder graph

* better typed linear init

* fix ut

* fix lint; add docstring

0227ddfb

21 Feb, 2022 1 commit

[Bugfix] Bug fixes in new dataloader (#3727) · 3f138eba

Quan (Andy) Gan authored Feb 22, 2022



* fixes

* fix

* more fixes

* update

* oops

* lint?

* temporarily revert - will fix in another PR

* more fixes

* skipping mxnet test

* address comments

* fix DDP

* fix edge dataloader exclusion problems

* stupid bug

* fix

* use_uvm option

* fix

* fixes

* fixes

* fixes

* fixes

* add evaluation for cluster gcn and ddp

* stupid bug again

* fixes

* move sanity checks to only support DGLGraphs

* pytorch lightning compatibility fixes

* remove

* poke

* more fixes

* fix

* fix

* disable test

* docstrings

* why is it getting a memory leak?

* fix

* update

* updates and temporarily disable forkingpickler

* update

* fix?

* fix?

* oops

* oops

* fix

* lint

* huh

* uh

* update

* fix

* made it memory efficient

* refine exclude interface

* fix tutorial

* fix tutorial

* fix graph duplication in CPU dataloader workers

* lint

* lint

* Revert "lint"

This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9.

* Revert "lint"

This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2.

* Revert "fix graph duplication in CPU dataloader workers"

This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f.
Co-authored-by: xiny <xiny@nvidia.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

3f138eba

18 Feb, 2022 1 commit

[Performance][GPU] Improving _SegmentMaskColKernel (#3745) · 7b9afbfa

ayasar70 authored Feb 18, 2022



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* timing repeatkernel

* clean

* clean

* clean

* updating _SegmentMaskColKernel

* Working on requests: removing sorted array check and adding comments to utility functions

* fixing lint issue
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

7b9afbfa

15 Feb, 2022 1 commit

[Feature] Gather mm (#3641) · b3d3a2c4

Israt Nisa authored Feb 14, 2022



* init

* init

* working cublasGemm

* benchmark high-mem/low-mem, err gather_mm output

* cuda kernel for bmm like kernel

* removed cpu copy for E_per_Rel

* benchmark code from Minjie

* fixed cublas results in gathermm sorted

* use GPU shared mem in unsorted gather mm

* minor

* Added an optimal version of gather_mm_unsorted

* lint

* init gather_mm_scatter

* cublas transpose added

* fixed h_offset for multiple rel

* backward unittest

* cublas support to transpose W

* adding missed file

* forgot to add header file

* lint

* lint

* cleanup

* lint

* docstring

* lint

* added unittest

* lint

* lint

* unittest

* changed err type

* skip cpu test

* skip CPU code

* move in-len loop inside

* lint

* added check different dim length for B

* w_per_len is optional now

* moved gather_mm to pytorch/backend with backward support

* removed a_/b_trans support

* transpose op inside GEMM call

* removed out alloc from API, changed W 2D to 3D

* Added se_gather_mm, Separate API for sortedE

* Fixed gather_mm (unsorted) user interface

* unsorted gmm backward + separate CAPI for un/sorted A

* typecast to float to support atomicAdd

* lint typecast

* lint

* added gather_mm_scatter

* minor

* const

* design changes

* Added idx_a, idx_b support gmm_scatter

* dgl doc

* lint

* adding gather_mm in ops

* lint

* lint

* minor

* removed benchmark files

* minor

* empty commit
Co-authored-by: Israt Nisa <nisisrat@amazon.com>

b3d3a2c4

09 Feb, 2022 1 commit

[Feature] CUDA UVA sampling for MultiLayerNeighborSampler (#3674) · 738e8318

Xin Yao authored Feb 09, 2022



* implement pin_memory/unpin_memory/is_pinned for dgl.graph

* update python docstring

* update c++ docstring

* add test

* fix the broken UnifiedTensor

* XPU_SWITCH for kDLCPUPinned

* a rough version ready for testing

* eliminate extra context parameter for pin/unpin

* update train_sampling

* fix linting

* fix typo

* multi-gpu uva sampling case

* disable new format materialization for pinned graphs

* update python doc for pin_memory_

* fix unit test

* UVA sampling for link prediction

* dispatch most csr ops

* update graphsage example to combine uva sampling and UnifiedTensor

* update graphsage example to combine uva sampling and UnifiedTensor

* update graphsage example to combine uva sampling and UnifiedTensor

* update doc

* update examples

* change unitgraph and heterograph's PinMemory to in-place

* update examples for multi-gpu uva sampling

* update doc

* fix linting

* fix cpu build

* fix is_pinned for DistGraph

* fix is_pinned for DistGraph

* update graphsage unsupervised example

* update doc for gpu sampling

* update some check for sampling device switching

* fix linting

* adapt for new dataloader

* fix linting

* fix

* fix some name issue

* adjust device check

* add unit test for uva sampling & fix some zero_copy bug

* fix linting

* update num_threads in graphsage examples
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

738e8318

21 Jan, 2022 1 commit

[Feature] Pin dgl.graph to the page-locked memory (#3616) · 40b44a43

Xin Yao authored Jan 21, 2022



* implement pin_memory/unpin_memory/is_pinned for dgl.graph

* update python docstring

* update c++ docstring

* add test

* fix the broken UnifiedTensor

* eliminate extra context parameter for pin/unpin

* fix linting

* fix typo

* disable new format materialization for pinned graphs

* update python doc for pin_memory_

* fix unit test

* update doc

* change unitgraph and heterograph's PinMemory to in-place

* update comments for NDArray's PinMemory_ and PinData

* update doc
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

40b44a43

17 Jan, 2022 2 commits
- [Bugfix] Fixes the redundancy parameter being used wrong in global negative sampling (#3657) · 77f4287a
  Quan (Andy) Gan authored Jan 17, 2022
```
* oops

* test
```
  77f4287a
- [Bugfix] Fix GPU global negative sampling code (#3653) · 2aad1c0b
  Quan (Andy) Gan authored Jan 17, 2022
```
* fix GPU global negative sampling code

* Update negative_sampling.cu
```
  2aad1c0b
10 Jan, 2022 1 commit
- disabling cuda11 apis (#3635) · c04b5bc7
  Quan (Andy) Gan authored Jan 10, 2022
  
  c04b5bc7