Commits · 69a532c1aba4bac714cce6746b610ba2b20b835c · OpenDAS / dgl

24 Jul, 2023 1 commit
- [Feature] Gpu cache for node and edge data (#4341) · 69a532c1
  Muhammed Fatih BALIN authored Jul 24, 2023
```
Co-authored-by: xiny <xiny@nvidia.com>
```
  69a532c1
17 Apr, 2023 1 commit
- [Fix] Remove curand host functions (#5552) · bea5c78b
  Xin Yao authored Apr 17, 2023
  
  bea5c78b
11 Apr, 2023 1 commit
- [Feature] Import PyTorch's `pin_memory()` method for DGL graph structure (#5366) · 62b5f50a
  Chang Liu authored Apr 10, 2023
  
  62b5f50a
08 Mar, 2023 1 commit

[Refactor] Replace third_party/nccl with PyTorch's NCCL backend (#4989) · 8d5d8962

Xin Yao authored Mar 08, 2023

* expose GeneratePermutation

* add sparse_all_to_all_push

* add sparse_all_to_all_pull

* add unit test

* handle world_size=1

* remove python nccl wrapper

* remove the nccl dependency

* use pinned memory to speedup D2H copy

* fix lint

* resolve comments

* fix lint

* fix ut

* resolve comments

8d5d8962

09 Dec, 2022 1 commit

[Bugfix] Fix empty tensors may being treated as pinned (#5005) · aad3bd04

Xin Yao authored Dec 09, 2022

* fix empty tensor is treated as pinned

* avoid calling cudaHostGetDevicePointer on nullptr

* update empty array

* add a comment

aad3bd04

22 Nov, 2022 1 commit

[Feature] (La)yer-Neigh(bor) sampling implementation (#4668) · bf264d00

Muhammed Fatih BALIN authored Nov 21, 2022



* adding LABOR sampling

* add ladies and pladies samplers

* fix compile error after rebase

* add reference for ladies sampler

* Improve ladies implementation.

* weighted labor sampling initial implementation draft
fix indentation and small bug in ladies script

* importance_sampling currently doesn't work with weights

* fix weighted importance sampling

* move labor example into its own folder

* lint fixes

* Improve documentation

* remove examples from the main PR

* fix linting by not using c++17 features

* fix documentation of labor_sampler.py

* update documentation for labor.py

* reformat the labor.py file with black

* fix linting errors

* replace exception use with if

* fix typo in error comment

* fixing win64 build for ci

* fixing weighted implementation, works now.

* fix bug in the weighted case and importance_sampling==0

* address part of the reviews

* remove unused code paths from cuda

* remove unused code path from cpu side

* remove extra features of labor making use of random seed.

* fix exclude_edges bug

* remove pcg and seed logic from cpu implementation, seed logic should still work for cuda.

* minor style change

* refactor CPU implementation, take out the importance_sampling probability computation into a function.

* improve CUDAWorkspaceAllocator

* refactor importance_sampling part out to a function

* minor optimization

* fix linting issue

* Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda."

This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2.

* Revert "remove extra features of labor making use of random seed."

This reverts commit 7f99034353080308f4783f27d9a08bea343fb796.

* fix the documentation

* disable NIDs

* improve the documentation in the code

* use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way.

* fix linting issue

* address another round of reviews

* further optimize CPU LABOR sampling implementation

* fix linting error

* update the comment

* reformat

* rename and rephrase comment

* fix formatting according to new linting specs

* fix compile error due to renaming, fix linting.

* lint

* rename DGLHeteroGraph to DGLGraph to match master

* replace other occurrences of DGLHeteroGraph to DGLGraph
Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com>
Co-authored-by: Kaan Sancak <kaansnck@gmail.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>

bf264d00

10 Nov, 2022 1 commit

[Bugfix] Fix that half-precision SpMM produce incorrect results (#4842) · a8f9d5ef

Xin Yao authored Nov 10, 2022

* update accumulator

* rename half to __half

* add bfloat16

* simplify code

* fix another case

* add unit test

* disable half-precision SpMMCoo

* fix lint

a8f9d5ef

07 Nov, 2022 2 commits

[Misc] clang-format auto fix. (#4831) · 889798fe

Hongzhi (Steve), Chen authored Nov 07, 2022



* [Misc] clang-format auto fix.

* blabla

* nolint

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

889798fe

[Misc] Replace /*! with /**. (#4823) · bcd37684

Hongzhi (Steve), Chen authored Nov 07, 2022



* replace

* blabla

* balbla

* blabla
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

bcd37684

06 Nov, 2022 1 commit

[Misc] Replace \xxx with @XXX in structured comment. (#4822) · 619d735d

Hongzhi (Steve), Chen authored Nov 07, 2022



* param

* brief

* note

* return

* tparam

* brief2

* file

* return2

* return

* blabla

* all
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

619d735d

04 Nov, 2022 1 commit

[Misc] clang-format auto fix. (#4811) · 401e1278

Hongzhi (Steve), Chen authored Nov 04, 2022



* [Misc] clang-format auto fix.

* fix

* manual
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

401e1278

21 Sep, 2022 1 commit
- [Fix] Enable lint check for cuh files and fix compiler warnings (#4585) · 880b3b1f
  Xin Yao authored Sep 21, 2022
```
* disable warning for tensorpipe

* fix warning

* enable lint check for cuh files

* resolve comments
```
  880b3b1f
19 Sep, 2022 1 commit

[Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) · cded5b80

Xin Yao authored Sep 19, 2022

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments

cded5b80

15 Sep, 2022 1 commit

[Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19

Xin Yao authored Sep 15, 2022

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

9a00cf19

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

31 Aug, 2022 1 commit

[Feature] Make TensorAdapter Stream Aware (#4472) · 2b766740

Xin Yao authored Aug 31, 2022

* Allocate tensors in DGL's current stream

* make tensoradaptor stream-aware

* replace TAemtpy with cpu allocator

* fix typo

* try fix cpu allocation

* clean header

* redirect AllocDataSpace as well

* resolve comments

2b766740

15 Aug, 2022 1 commit
- [Bugfix] Fix pinning empty tensors and graphs (#4393) · 3685000a
  Xin Yao authored Aug 15, 2022
  
  3685000a
09 Jul, 2022 1 commit
- [Bugfix] Add CUDA context availability check before setting curand seed (#4223) · 1feec870
  Xin Yao authored Jul 09, 2022
  
  1feec870
07 Jul, 2022 1 commit
- [Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available (#4199) · 9ee7ced5
  Xin Yao authored Jul 07, 2022
  
  9ee7ced5
29 Jun, 2022 1 commit

[bugfix] Allow communicators of size one when NCCL is missing (#3713) · 1dddaad4

nv-dlasalle authored Jun 28, 2022



* Update nccl communicator for when NCCL is missing

* Use static_cast

* Add doc string

* Fix whitespace

* Resrtict unit test to GPU runs
Co-authored-by: Xin Yao <xiny@nvidia.com>

1dddaad4

11 Jun, 2022 1 commit

[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083) · 60b1c992

Xin Yao authored Jun 11, 2022



* Wrap all CUDA runtime API/CUB calls with macro

* remove the usage of explicit cudaMalloc in favor of AllocWorkspace

* fix typo
Co-authored-by: Israt Nisa <neesha295@gmail.com>

60b1c992

06 Jun, 2022 1 commit

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d

12 May, 2022 1 commit
- Fix launch parameters index select kernel in sparse push (#3524) · 4177f729
  nv-dlasalle authored May 12, 2022
  
  4177f729
21 Feb, 2022 1 commit

[Bugfix] Bug fixes in new dataloader (#3727) · 3f138eba

Quan (Andy) Gan authored Feb 22, 2022



* fixes

* fix

* more fixes

* update

* oops

* lint?

* temporarily revert - will fix in another PR

* more fixes

* skipping mxnet test

* address comments

* fix DDP

* fix edge dataloader exclusion problems

* stupid bug

* fix

* use_uvm option

* fix

* fixes

* fixes

* fixes

* fixes

* add evaluation for cluster gcn and ddp

* stupid bug again

* fixes

* move sanity checks to only support DGLGraphs

* pytorch lightning compatibility fixes

* remove

* poke

* more fixes

* fix

* fix

* disable test

* docstrings

* why is it getting a memory leak?

* fix

* update

* updates and temporarily disable forkingpickler

* update

* fix?

* fix?

* oops

* oops

* fix

* lint

* huh

* uh

* update

* fix

* made it memory efficient

* refine exclude interface

* fix tutorial

* fix tutorial

* fix graph duplication in CPU dataloader workers

* lint

* lint

* Revert "lint"

This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9.

* Revert "lint"

This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2.

* Revert "fix graph duplication in CPU dataloader workers"

This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f.
Co-authored-by: xiny <xiny@nvidia.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

3f138eba

21 Jan, 2022 1 commit

[Feature] Pin dgl.graph to the page-locked memory (#3616) · 40b44a43

Xin Yao authored Jan 21, 2022



* implement pin_memory/unpin_memory/is_pinned for dgl.graph

* update python docstring

* update c++ docstring

* add test

* fix the broken UnifiedTensor

* eliminate extra context parameter for pin/unpin

* fix linting

* fix typo

* disable new format materialization for pinned graphs

* update python doc for pin_memory_

* fix unit test

* update doc

* change unitgraph and heterograph's PinMemory to in-place

* update comments for NDArray's PinMemory_ and PinData

* update doc
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

40b44a43

18 Oct, 2021 1 commit
- [Fix] Split nccl sparse push into two groups (#3404) · c560040f
  nv-dlasalle authored Oct 18, 2021
  
  c560040f
15 Oct, 2021 1 commit

[Bugfix] Add UVM specialized IndexSelect kernels which perform boundary checks (#3293) · 4f5c3aa2

David Min authored Oct 15, 2021



* Add pytorch-direct version

* remove

* add documentation for UnifiedTensor

* Revert "add documentation for UnifiedTensor"

This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849.

* add boundary check for UVM IndexSelect

* relocate boundary check index kernels to cuda

* fix function name

* fix indexkernel in nccl api

* fix argument ordering

* simplify code

* Add a comment for the uvm version
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

4f5c3aa2

29 Sep, 2021 1 commit

[Feature] enable create/set/free cuda stream for internal use (#3334) · e234fcfa

Rhett Ying authored Sep 29, 2021

* [Feature] enable create/set/free cuda stream for internal use

* add unit test

* fix unit test failure on mxnet and tf

* refactor stream wrapper

* fix lint error

* fix lint error

e234fcfa

06 Sep, 2021 1 commit
- Remove deprecated kernels (#3316) · c81efdf2
  Jinjing Zhou authored Sep 06, 2021
```
* remove

* remove

* fix

* remove

* remove
```
  c81efdf2
19 Aug, 2021 1 commit

[Performance][Feature] Implement edge excluding in EdgeDataLoader on GPU (#3226) · f6349508

nv-dlasalle authored Aug 19, 2021



* Update filter code

* Add unit tests

* Fixes

* Switch to indices

* Rename functions

* Fix linting

* Fix whitespace

* Add doc

* Fix heterograph

* Change workspace allocation

* Fix linting

* Fix docs in filter.py

* Add todo
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

f6349508

16 Jul, 2021 1 commit

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy... · 905c0aa5

David Min authored Jul 17, 2021

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy host memory access from GPU (#3086)

* Add pytorch-direct version

* Initial commit of unified tensor

* Merge branch 'master' of https://github.com/davidmin7/dgl



* Remove unnecessary things

* Fix error message

* Fix/Add descriptions

* whitespace fix

* add unpin

* disable IndexSelectCPUFromGPU with no CUDA

* add a newline for unified_tensor.py

* Apply changes based on feedback

* add 'os' module

* skip unified tensor unit test for cpu only

* Update tests/pytorch/test_unified_tensor.py
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

* reflect feedback
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

905c0aa5

27 Jun, 2021 1 commit

[Build] Make nccl optional (#3056) · 9664cdff

Jinjing Zhou authored Jun 27, 2021

* fix

* remove nvidiasmi

* fix

* fix docs

* fix

* fix

* 1

* fix

* remove

* skip deprecated kernel

* fix

* Revert "skip deprecated kernel"

This reverts commit c5ceb7f60dbbaf065b81cc3680757fd611d90ad3.

* fix

9664cdff

23 Jun, 2021 1 commit
- [Bugfix] Handle case where process has no elements to update, in NCCL communicator (#3035) · 7415eaa5
  nv-dlasalle authored Jun 23, 2021
```
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
```
  7415eaa5
11 Jun, 2021 1 commit

[Feature] Allow using NCCL for communication in dgl.NodeEmbedding and dgl.SparseOptimizer (#2824) · 17d604b5

nv-dlasalle authored Jun 10, 2021



* Split from NCCL PR

* Fix type in comment

* Expand documentation for sparse_all_to_all_push

* Restore previous behavior in example

* Re-work optimizer to use NCCL based on gradient location

* Allow for running with embedding on CPU but using NCCL for gradient exchange

* Optimize single partition case

* Fix pylint errors

* Add missing include

* fix gradient indexing

* Fix line continuation

* Migrate 'first_step'

* Skip tests without enough GPUs to run NCCL

* Improve empty tensor handling for pytorch 1.5

* Fix indentation

* Allow multiple NCCL communicator to coexist

* Improve handling of empty message

* Update python/dgl/nn/pytorch/sparse_emb.py
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

* Update python/dgl/nn/pytorch/sparse_emb.py
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

* Keepy empty tensor dimensionaless

* th.empty -> th.tensor

* Preserve shape for empty non-zero dimension tensors

* Use shared state, when embedding is shared

* Add support for gathering an embedding

* Fix typo

* Fix more typos

* Fix backend call

* Use NodeDataLoader to take advantage of ddp

* Update training script to share memory

* Only squeeze last dimension

* Better handle empty message

* Keep embedding on the target device GPU if dgl_sparse if false in RGCN example

* Fix typo in comment

* Add asserts

* Improve documentation in example
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

17d604b5

20 May, 2021 1 commit

[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings... · ae8dbe6d

nv-dlasalle authored May 20, 2021


[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)

* Split NCCL wrapper from sparse optimizer and sparse embedding

* Add more unit tests for single node nccl

* Fix unit test for tf

* Switch to device histogram

* Fix histgram issues

* Finish migration to histogram

* Handle cases with zero send/recieve data

* Start on partition object

* Get compiling

* Updates

* Add unit tests

* Switch to partition object

* Fix linting issues

* Rename partition file

* Add python doc

* Fix python assert and finish doxygen comments

* Remove stubs for range based partition to satisfy pylint

* Wrap unit test in GPU only

* Wrap explicit cuda call in ifdef

* Merge with partition.py

* update docstrings

* Cleanup partition_op

* Add Workspace object

* Switch to using workspace object

* Move last remainder based function out of nccl_api

* Add error messages

* Update docs with examples

* Fix linting erros
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

ae8dbe6d

22 Mar, 2021 1 commit

[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to... · 0ff7127a

nv-dlasalle authored Mar 22, 2021


[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 (#2758)

* Wrap cub with prefixes and remove thrust

* Using counting iterator
Co-authored-by: Zihao Ye <expye@outlook.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

0ff7127a

09 Mar, 2021 1 commit

[Feature] Add edge coarsening for homogeneous undirected graphs (#2691) · c88fca50

Tianqi Zhang (张天启) authored Mar 09, 2021



* finish graph matching gpu version

* use C++ shuffle

* finish graph matching

* fix bug

* fix bug

* change name and use swap

* upt

* fix format problem

* fix format problem

* stronger test

* upt

* upt

* change python api

* upt

* upt

* format check

* upt

* upt

* fix bug
Co-authored-by: Tong He <hetong007@gmail.com>

c88fca50

08 Feb, 2021 1 commit

[Sampling] Implement `dgl.to_block()` for the GPU (#2339) · bc3a532f

nv-dlasalle authored Feb 07, 2021



* Add start of to_block gpu implementation

* Pull in more changes from 0.4.2 cuda_to_block

* Move more code to IdArray

* Refactor DeviceNodeMapMaker

* Updates

* get compiling

* Integrate to_block

* Fix ID allocation

* Minor fixes

* Cleanup cuda calls to use cuda_common

* Reduce kernel calls

* Lint cleanup

* Expand documentation

* Remove unused function

* Rename variables for consistency

* Add doxygen comments

* Fix file extension

* Remove raw asynccopy for deviceapi

* Remove unused function

* Fix block/tile configuration

* Add cuda_device_common.cuh

* Add basic hashtable

* Migrate part of hashtable

* Refactor to use external hashtable

* Make functions members

* Format hash table functions

* Migrate duplicate filling

* Move last function over

* Refactor with cu file

* lint c++ code

* Move context check to C++ code

* Use macro switch

* Add missing files

* Update docstring

* update docs

* Move atomic functions

* Refactor hashtable

* Fix linting

* Expand docs

* Fix mismatched argument names

* Switch doxygen comments from using @param to \param
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

bc3a532f

29 Jan, 2021 1 commit
- fix build problems (#2594) · 460bb42d
  Quan (Andy) Gan authored Jan 29, 2021
  
  460bb42d
28 Jan, 2021 1 commit

[feature] Supporting half precision floating data type (fp16). (#2552) · 7bab1365

Zihao Ye authored Jan 28, 2021



* add tvm as submodule

* compilation is ok but calling fails

* can call now

* pack multiple modules, change names

* upd

* upd

* upd

* fix cmake

* upd

* upd

* upd

* upd

* fix

* relative path

* upd

* upd

* upd

* singleton

* upd

* trigger

* fix

* upd

* count reducible

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* upd

* only keep related files

* upd

* upd

* upd

* upd

* lint

* lint

* lint

* lint

* pylint

* upd

* upd

* compilation

* fix

* upd

* upd

* upd

* upd

* upd

* upd

* upd doc

* refactor

* fix

* upd number
Co-authored-by: Zhi Lin <linzhilynn@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-78.us-east-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-21-156.us-east-2.compute.internal>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

7bab1365