Commits · f1689ad0e12c2d6f4b00b7564b9b81dcc1301a39 · OpenDAS / dgl

15 Sep, 2022 1 commit

[Feature] Import PyTorch's CUDA stream management (#4503) · 9a00cf19

Xin Yao authored Sep 15, 2022

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

9a00cf19

06 Sep, 2022 1 commit

[Feature] Unify the cuda stream used in core library (#4480) · 1c9d2a03

Chang Liu authored Sep 05, 2022



* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: xiny <xiny@nvidia.com>

1c9d2a03

23 Jun, 2022 1 commit

[Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879

Triston authored Jun 22, 2022



* Fix a cub compile error for CUDA 11.5

* Fix comparison of integer expressions of different signedness in coo_sort.cu file

* Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file

* Remove never referenced variable in spmm.cu

* Fix comparison of integer expressions of different signedness in rowwise_pick.h file

* Fix comparison of integer expressions of different signedness in choice.cc file

* Remove never referenced variable col_data in spat_op_impl_coo.cc

* Remove never referenced variable allowed in global_uniform.cc

* Fix comparison of integer expressions of different signedness in graph.cc

* Fix comparison of integer expressions of different signedness in graph_apis.cc

* Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build

* Fix comparison of integer expressions of different signedness in libra_partition.cc

* Fix comparison of integer expressions of different signedness in graph_op.cc
Co-authored-by: Triston Cao <tristonc@nvidia.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

1ad65879

06 Jun, 2022 1 commit

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d

16 May, 2022 1 commit
- [Peformance] Remove unnecessary induced vertices in EdgeSubgraph (#3978) · 03024f95
  Xin Yao authored May 16, 2022
```
* remove unnecessary induced vertices in EdgeSubgraph

* add unit test
```
  03024f95
01 Mar, 2022 1 commit
- [Build] Working around broken name mangling in MSVC 16.5.5 + CUDA 11.3 (#3790) · 396d7180
  Quan (Andy) Gan authored Mar 01, 2022
```
* fix

* explain

* oops
```
  396d7180
21 Oct, 2021 1 commit

[Sampling] Implement dgl.compact_graphs() for the GPU (#3423) · a8c81018

Xin Yao authored Oct 21, 2021

* gpu compact graph template

* cuda compact graph draft

* fix typo

* compact graphs

* pass unit test but fail in training

* example using EdgeDataLoader on the GPU

* refactor cuda_compact_graph and cuda_to_block

* update training scripts

* fix linting

* fix linting

* fix exclude_edges for the GPU

* add --data-cpu & fix copyright

a8c81018

16 Sep, 2021 1 commit

[Performance][Feature] Add `src_nodes` paramter to `to_block()` to avoid cost... · 2647afc9

nv-dlasalle authored Sep 15, 2021


[Performance][Feature] Add `src_nodes` paramter to `to_block()` to avoid cost running unique() when available. (#2973)

* Add lhs_nodes are paremeter to to_block

* Update unit test

* Switch to simplified node conversion

* Switch lhs_nodes to be in/out parameter

* Update docs
Co-authored-by: Da Zheng <zhengda1936@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

2647afc9

15 Jun, 2021 1 commit

[Feature] Add NN-descent support for the KNN graph function in dgl (#2941) · 64d0f3f3

Tianqi Zhang (张天启) authored Jun 15, 2021



* add bruteforce impl

* add nn descent implementation

* change doc-string

* remove redundant func

* use local rng for cuda

* fix lint

* fix lint

* fix bug

* fix bug

* wrap nndescent_knn_graph into knn

* fix lint

* change function names

* add comment for dist funcs

* let the compiler do the unrolling

* use better blocksize setting

* remove redundant line

* check the return of the cub calls
Co-authored-by: Tong He <hetong007@gmail.com>

64d0f3f3

13 Jun, 2021 1 commit

[Performance] Perform to_block on the GPU when the dataloader is created with... · 8b64ae59

nv-dlasalle authored Jun 13, 2021


[Performance] Perform to_block on the GPU when the dataloader is created with a GPU `device`. (#3016)

* add output device for dataloading

* Update dataloader

* Get sampler device from dataloader

* Fix line length

* Update examples

* Fix to_block GPU for empty relation types

* Handle the case where the DistGraph has None for the underlying graph
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

8b64ae59

19 May, 2021 1 commit

[Feature] Add bruteforce implementation for KNN with O(Nk) space complexity (#2892) · 5d7e80f4

Tianqi Zhang (张天启) authored May 19, 2021



* add bruteforce impl

* add support for bruteforce-sharemem

* modify python API

* add tests

* change file path

* change python API

* fix lint

* fix test

* also check worst_dist in the last few dim

* use heap and early-stop on CPU

* fix lint

* fix lint

* add device check

* use cuda function to determine max shared mem

* use cuda to determine block info

* add memory free for tmp var

* update doc-string and add dist option

* fix lint

* add more tests
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

5d7e80f4

08 Feb, 2021 1 commit

[Sampling] Implement `dgl.to_block()` for the GPU (#2339) · bc3a532f

nv-dlasalle authored Feb 07, 2021



* Add start of to_block gpu implementation

* Pull in more changes from 0.4.2 cuda_to_block

* Move more code to IdArray

* Refactor DeviceNodeMapMaker

* Updates

* get compiling

* Integrate to_block

* Fix ID allocation

* Minor fixes

* Cleanup cuda calls to use cuda_common

* Reduce kernel calls

* Lint cleanup

* Expand documentation

* Remove unused function

* Rename variables for consistency

* Add doxygen comments

* Fix file extension

* Remove raw asynccopy for deviceapi

* Remove unused function

* Fix block/tile configuration

* Add cuda_device_common.cuh

* Add basic hashtable

* Migrate part of hashtable

* Refactor to use external hashtable

* Make functions members

* Format hash table functions

* Migrate duplicate filling

* Move last function over

* Refactor with cu file

* lint c++ code

* Move context check to C++ code

* Use macro switch

* Add missing files

* Update docstring

* update docs

* Move atomic functions

* Refactor hashtable

* Fix linting

* Expand docs

* Fix mismatched argument names

* Switch doxygen comments from using @param to \param
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

bc3a532f