Commits · 5e78e070f4b3c7a956fd38a09ba91aa9179f8c58 · OpenDAS / dgl

"git@developer.sourcefind.cn:change/sglang.git" did not exist on "c23eda8589f62eb9dc94ae44c6bccf976125351d"

23 Nov, 2023 1 commit
- [Misc] Fix signed unsigned comparison warning (#6602) · 5e78e070
  Muhammed Fatih BALIN authored Nov 22, 2023
  
  5e78e070
22 Nov, 2023 1 commit
- [CUDA] Fix issue about integer overflow (#6586) · bfde1422
  Muhammed Fatih BALIN authored Nov 22, 2023
  
  bfde1422
14 Aug, 2023 1 commit
- [Build] Fix bf16/fp16 building issues for CUDA 12.2 (#6074) · 08d18a47
  Xin Yao authored Aug 14, 2023
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
  08d18a47
10 Aug, 2023 1 commit
- [Bugfix] Fix cusparseCreateCsr format for cuda12 (#6121) · 88964a82
  Chang Liu authored Aug 10, 2023
  
  88964a82
19 Jul, 2023 1 commit
- [Feature] Adding kappa feature for labor (Cooperative Minibatching) (#6006) · d3bd4c61
  Muhammed Fatih BALIN authored Jul 18, 2023
```
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
```
  d3bd4c61
14 Jul, 2023 2 commits
- [Performance][CUDA] Sorting for indices for UVM code path. (#5882) · f5f7e08e
  Muhammed Fatih BALIN authored Jul 14, 2023
  
  f5f7e08e
- [Performance][CUDA] Faster CSRToCOO (#5648) · 83115794
  Muhammed Fatih BALIN authored Jul 14, 2023
```
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
```
  83115794
13 Jul, 2023 1 commit

[Performance][CUDA] Labor UVA optimization (#5885) · c3aea1b6

Muhammed Fatih BALIN authored Jul 13, 2023


Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

c3aea1b6

17 May, 2023 1 commit
- [Performance Improvement] Make GPU sampling and to_block use pinned memory to... · 46af76c3
  nv-dlasalle authored May 17, 2023
```
[Performance Improvement] Make GPU sampling and to_block use pinned memory to decrease required synchronization (#5685)
```
  46af76c3
23 Mar, 2023 1 commit
- [Performance] Creating out buffers for `segment_mm`|`sddmm` via `torch.empty()` (#5462) · 170203ae
  Xin Yao authored Mar 23, 2023
```
* update for segmentMM

* update for sddmm

* fix a bug
```
  170203ae
08 Mar, 2023 1 commit
- Fix compile error on ubuntu22.04_g++11.3.0 (#5434) · b1ec112e
  Rhett Ying authored Mar 08, 2023
  
  b1ec112e
12 Jan, 2023 1 commit
- [Bugfix] Replace global cudaStream in Filter with runtime calls (fix #5153) (#5157) · 751b4c26
  nv-dlasalle authored Jan 12, 2023
```
* Add failing unit test

* Add fix

* Remove extra newline

* skip cpu test
Co-authored-by: Xin Yao <yaox12@outlook.com>
```
  751b4c26
09 Dec, 2022 1 commit

[Bugfix] Fix empty tensors may being treated as pinned (#5005) · aad3bd04

Xin Yao authored Dec 09, 2022

* fix empty tensor is treated as pinned

* avoid calling cudaHostGetDevicePointer on nullptr

* update empty array

* add a comment

aad3bd04

06 Dec, 2022 1 commit

Add support for next cusparse release (#4974) · fb223d47

Chang Liu authored Dec 05, 2022

* Add support for next cusparse release

* Fix lint

* Add switch and tune the performance

* Fix lint issue

* Fine tune the heuristics

* Fix lint issue

* Address comments

* Minor fix

* Address comments

fb223d47

24 Nov, 2022 1 commit
- [Cleanup] Remove duplicated _IndexSelect (#4874) · c59000ac
  Xin Yao authored Nov 24, 2022
  
  c59000ac
22 Nov, 2022 2 commits

[Performance] Leverage hashmap to accelerate CSRSliceMatrix<kDGLCUDA, IdType> (#4924) · aa419895

Ping Gong authored Nov 22, 2022



* Leverage hashmap to accelerate CSRSliceMatrix

* fix lint check

* use `min` in cuda_runtime.ch

* fix hash func

* add some comments and adjust the <grid,block> of the _SegmentMaskColKernel kernel

* set device and stream for thrust::for_each

* use thrust::cuda::par_nosync
Co-authored-by: Xin Yao <xiny@nvidia.com>

aa419895

[Feature] (La)yer-Neigh(bor) sampling implementation (#4668) · bf264d00

Muhammed Fatih BALIN authored Nov 21, 2022



* adding LABOR sampling

* add ladies and pladies samplers

* fix compile error after rebase

* add reference for ladies sampler

* Improve ladies implementation.

* weighted labor sampling initial implementation draft
fix indentation and small bug in ladies script

* importance_sampling currently doesn't work with weights

* fix weighted importance sampling

* move labor example into its own folder

* lint fixes

* Improve documentation

* remove examples from the main PR

* fix linting by not using c++17 features

* fix documentation of labor_sampler.py

* update documentation for labor.py

* reformat the labor.py file with black

* fix linting errors

* replace exception use with if

* fix typo in error comment

* fixing win64 build for ci

* fixing weighted implementation, works now.

* fix bug in the weighted case and importance_sampling==0

* address part of the reviews

* remove unused code paths from cuda

* remove unused code path from cpu side

* remove extra features of labor making use of random seed.

* fix exclude_edges bug

* remove pcg and seed logic from cpu implementation, seed logic should still work for cuda.

* minor style change

* refactor CPU implementation, take out the importance_sampling probability computation into a function.

* improve CUDAWorkspaceAllocator

* refactor importance_sampling part out to a function

* minor optimization

* fix linting issue

* Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda."

This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2.

* Revert "remove extra features of labor making use of random seed."

This reverts commit 7f99034353080308f4783f27d9a08bea343fb796.

* fix the documentation

* disable NIDs

* improve the documentation in the code

* use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way.

* fix linting issue

* address another round of reviews

* further optimize CPU LABOR sampling implementation

* fix linting error

* update the comment

* reformat

* rename and rephrase comment

* fix formatting according to new linting specs

* fix compile error due to renaming, fix linting.

* lint

* rename DGLHeteroGraph to DGLGraph to match master

* replace other occurrences of DGLHeteroGraph to DGLGraph
Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com>
Co-authored-by: Kaan Sancak <kaansnck@gmail.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>

bf264d00

10 Nov, 2022 1 commit

[Bugfix] Fix that half-precision SpMM produce incorrect results (#4842) · a8f9d5ef

Xin Yao authored Nov 10, 2022

* update accumulator

* rename half to __half

* add bfloat16

* simplify code

* fix another case

* add unit test

* disable half-precision SpMMCoo

* fix lint

a8f9d5ef

08 Nov, 2022 1 commit

[Misc] Minor code style fix. (#4843) · cb5e3489

Hongzhi (Steve), Chen authored Nov 08, 2022



* [Misc] Change the max line length for cpp to 80 in lint.

* blabla

* blabla

* blabla

* ablabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

cb5e3489

07 Nov, 2022 4 commits

[Misc] clang-format auto fix. (#4831) · 889798fe

Hongzhi (Steve), Chen authored Nov 07, 2022



* [Misc] clang-format auto fix.

* blabla

* nolint

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

889798fe

[Misc] Minor code style fix. (#4825) · df089424

Hongzhi (Steve), Chen authored Nov 07, 2022



* blabla

* more

* blabla

* blabla

* ablabla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

df089424

[Misc] clang-format auto fix. (#4824) · 8ac27dad

Hongzhi (Steve), Chen authored Nov 07, 2022



* [Misc] clang-format auto fix.

* blabla

* ablabla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

8ac27dad

[Misc] Replace /*! with /**. (#4823) · bcd37684

Hongzhi (Steve), Chen authored Nov 07, 2022



* replace

* blabla

* balbla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bcd37684

06 Nov, 2022 2 commits

[Misc] Replace \xxx with @XXX in structured comment. (#4822) · 619d735d

Hongzhi (Steve), Chen authored Nov 07, 2022



* param

* brief

* note

* return

* tparam

* brief2

* file

* return2

* return

* blabla

* all
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

619d735d

[Feature] Add bfloat16 (bf16) support (#4648) · 96297fb8

Xin Yao authored Nov 06, 2022

* add bf16 specializations

* remove SWITCH_BITS

* enable amp for bf16

* remove SWITCH_BITS for cpu kernels

* enbale bf16 based on CUDART

* fix compiling for sm<80

* fix cpu build

* enable unit tests

* update doc

* disable test for CUDA < 11.0

* address comments

* address comments

96297fb8

03 Nov, 2022 2 commits

[Misc] clang-format auto fix. (#4804) · 8ae50c42

Hongzhi (Steve), Chen authored Nov 03, 2022



* [Misc] clang-format auto fix.

* manual

* manual

* manual

* manual

* todo

* fix
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

8ae50c42

[Bugfix] Fix that UVA cannot work on old GPUs (#4781) · 16e771c0
Xin Yao authored Nov 03, 2022
```
* get device pointers

* change if condition to IsPinned
```
16e771c0

28 Oct, 2022 1 commit

[Sampling] Enable sampling with edge masks on homogeneous graph (#4748) · 72781efb

Quan (Andy) Gan authored Oct 28, 2022

* sample neighbors with masks

* oops

* refactor again

* remove

* remove debug code

* rename macro

* address comments

* address comment

* address comments

* rename a lot of stuff

* oops

72781efb

13 Oct, 2022 1 commit

[Deprecation] Dataset Attributes (#4666) · e452179c

Mufei Li authored Oct 13, 2022



* Update from master (#4584)

* [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430)

* Add refactors for multi-gpu and full-graph example

* Fix format

* Update

* Update

* Update

* [Cleanup] Remove async_transferer (#4505)

* Remove async_transferer

* remove test

* Remove AsyncTransferer
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>

* [Cleanup] Remove duplicate entries of CUB submodule   (issue# 4395) (#4499)

* remove third_part/cub

* remove from third_party
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>

* [Bug] Enable turn on/off libxsmm at runtime (#4455)

* enable turn on/off libxsmm at runtime by adding a global config and related API
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>

* [Feature] Unify the cuda stream used in core library (#4480)

* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

* [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389)

* * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph
* Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information

* * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing

* * Added test to ensure dgl.remove_self_loop function correctly updates batch information

* * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph

* * Formatting

* * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph

* * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass
* Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed

* * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments

* * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test

* * Added doc comments for new parameters

* * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph
* Added test cases for more than k coincident points

* * Updated doc comments for output_batch parameters for clarity

* * Linter formatting fixes

* * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu

* * Rewording in doc comments

* * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [CI] only known devs are authorized to trigger CI (#4518)

* [CI] only known devs are authorized to trigger CI

* fix if author is null

* add comments

* [Readability] Auto fix setup.py and update-version.py (#4446)

* Auto fix update-version

* Auto fix setup.py

* Auto fix update-version

* Auto fix setup.py

* [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438)

* Update distributed-preprocessing.rst

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* fix unpinning when tensoradaptor is not available (#4450)

* [Doc] fix print issue in tutorial (#4459)

* [Example][Refactor] Refactor RGCN example (#4327)

* Refactor full graph entity classification

* Refactor rgcn with sampling

* README update

* Update

* Results update

* Respect default setting of self_loop=false in entity.py

* Update

* Update README

* Update for multi-gpu

* Update

* [doc] fix invalid link in user guide (#4468)

* [Example] directional_GSN for ogbg-molpcba (#4405)

* version-1

* version-2

* version-3

* update examples/README

* Update .gitignore

* update performance in README, delete scripts

* 1st approving review

* 2nd approving review
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

* Clarify the message name, which is 'm'. (#4462)
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* [Refactor] Auto fix view.py. (#4461)
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [Example] SEAL for OGBL (#4291)

* [Example] SEAL for OGBL

* update index

* update

* fix readme typo

* add seal sampler

* modify set ops

* prefetch

* efficiency test

* update

* optimize

* fix ScatterAdd dtype issue

* update sampler style

* update
Co-authored-by: Quan Gan <coin2028@hotmail.com>

* [CI] use https instead of http (#4488)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block()

* fix test failure in TF

* [Feature] Make TensorAdapter Stream Aware (#4472)

* Allocate tensors in DGL's current stream

* make tensoradaptor stream-aware

* replace TAemtpy with cpu allocator

* fix typo

* try fix cpu allocation

* clean header

* redirect AllocDataSpace as well

* resolve comments

* [Build][Doc] Specify the sphinx version (#4465)
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* reformat

* reformat

* Auto fix update-version

* Auto fix setup.py

* reformat

* reformat
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>

* Move mock version of dgl_sparse library to DGL main repo (#4524)

* init

* Add api doc for sparse library

* support op btwn matrices with differnt sparsity

* Fixed docstring

* addresses comments

* lint check

* change keyword format to fmt
Co-authored-by: Israt Nisa <nisisrat@amazon.com>

* [DistPart] expose timeout config for process group (#4532)

* [DistPart] expose timeout config for process group

* refine code

* Update tools/distpartitioning/data_proc_pipeline.py
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

* [Feature] Import PyTorch's CUDA stream management (#4503)

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

* [examples]educe memory consumption (#4558)

* [examples]educe memory consumption

* reffine help message

* refine

* [Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)

* Added cugraph nightly scripts

* Removed nvcr.io//nvidia/pytorch:22.04-py3 reference
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)" (#4563)

This reverts commit ec171c64

.

* [Misc] Add flake8 lint workflow. (#4566)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.

* Add flake8 annotation in workflow.

* remove

* add

* clean up
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Try use official pylint workflow. (#4568)

* polish update_version

* update pylint workflow.

* add

* revert.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [CI] refine stage logic (#4565)

* [CI] refine stage logic

* refine

* refine

* remove (#4570)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Add Pylint workflow for flake8. (#4571)

* remove

* Add pylint.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Update the python version in Pylint workflow for flake8. (#4572)

* remove

* Add pylint.

* Change the python version for pylint.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4574)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Use another workflow. (#4575)

* Update pylint.

* Use another workflow.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4576)
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint.yml

* Update pylint.yml

* Delete pylint.yml

* [Misc]Add pyproject.toml for autopep8 & black. (#4543)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454)

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

* [Deprecation] Dataset Attributes (#4546)

* Update

* CI

* CI

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* [Example] Bug Fix (#4665)

* Update

* CI

* CI

* Update

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* Update
Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: rudongyu <ru_dongyu@outlook.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

e452179c

11 Oct, 2022 1 commit

[Misc] ClangFormat auto fix. (#4685) · bd3fe59e

Hongzhi (Steve), Chen authored Oct 11, 2022



* Auto fix c++.

* reformat
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bd3fe59e

21 Sep, 2022 1 commit
- [Fix] Enable lint check for cuh files and fix compiler warnings (#4585) · 880b3b1f
  Xin Yao authored Sep 21, 2022
```
* disable warning for tensorpipe

* fix warning

* enable lint check for cuh files

* resolve comments
```
  880b3b1f
19 Sep, 2022 1 commit

[Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) · cded5b80

Xin Yao authored Sep 19, 2022

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments

cded5b80

15 Sep, 2022 1 commit

[Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19

Xin Yao authored Sep 15, 2022

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

9a00cf19

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

12 Aug, 2022 1 commit
- [Performance] Improve the performance of SpMMCsr by reconfiguration (#4363) · 2523bc7a
  Xin Yao authored Aug 12, 2022
```
* Change CUDA_MAX_NUM_THREADS to 256

* change the configuration of grid
```
  2523bc7a
09 Aug, 2022 1 commit
- [Bug] Fix broken static_assert (#4342) · 182e1ad5
  Xin Yao authored Aug 09, 2022
  
  182e1ad5
29 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e

Xin Yao authored Jul 29, 2022



* add weighted sampling without replacement (A-Chao)

* improve Algorithm A-Chao with block-wise prefix sum

* correctly fill out_idxs

* implement weighted sampling with replacement

* small fix

* merge host-side code of weighted/uniform sampling

* enable unit tests for cuda weighted sampling

* move thrust/cub wrapper to the cmake file

* update docs accordingly

* fix linting

* fix linting

* fix unit test

* Bump external CUB/Thrust versions

* Fix code style and update description of algorithm design

* [Feature] GPU support weighted graph neighbor sampling
commit by pengqirong(OPPO)

* merge pengqirong's implementation

* revert the change to cub and thrust

* fix linting

* use DeviceSegmentedSort for better performance

* add more comments

* add necessary notes

* add necessary notes

* resolve some comments

* define THRUST_CUB_WRAPPED_NAMESPACE

* fix doc
Co-authored-by: 彭齐荣 <657017034@qq.com>

86c81b4e

15 Jul, 2022 1 commit
- decompose (#4259) · 9a7ad16e
  Quan (Andy) Gan authored Jul 15, 2022
  
  9a7ad16e
27 Jun, 2022 1 commit

[Bug][Feature] Added more missing FP16 specializations (#4140) · a5d8460c

ndickson-nvidia authored Jun 27, 2022

* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`
* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas

* * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM

* * Added missing instantiation of DLDataTypeTraits<__half>::dtype

* * Fixed linter error
* Added clearer comment explaining why the cast to long long is necessary

* * Worked around a compile error in some particular setup, where __half can't be constructed on the host side

* * Fixed linter formatting errors

* * Changes to comments as recommended

* * Made recommended changes to logging errors in FP16 specializations
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)

a5d8460c

24 Jun, 2022 1 commit

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249

nv-dlasalle authored Jun 23, 2022



* Add uva by default to embedding

* More updates

* Update optimizer

* Add new uva functions

* Expose new pinned memory function

* Add unit tests

* Update formatting

* Fix unit test

* Handle auto UVA case when training is on CPU

* Allow per-embedding decisions for whether to use UVA

* Address spares_optim.py comments

* Remove unused templates

* Update unit test

* Use dgl allocate memory for pinning

* allow automatically unpin

* workaround for d2h copy with a different dtype

* fix linting

* update error message

* update copyright
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

020f0249