Commits · 02e4cd8b56514e5ab2c6330dc9782bebc573794a · OpenDAS / dgl

26 Jan, 2022 1 commit

[Feature] long live server for multiple client groups (#3645) · 02e4cd8b

Rhett Ying authored Jan 26, 2022

* [Feature] long live server for multiple client groups

* generate globally unique name for DistTensor within DGL automatically

02e4cd8b

21 Jan, 2022 1 commit

[Feature] Pin dgl.graph to the page-locked memory (#3616) · 40b44a43

Xin Yao authored Jan 21, 2022



* implement pin_memory/unpin_memory/is_pinned for dgl.graph

* update python docstring

* update c++ docstring

* add test

* fix the broken UnifiedTensor

* eliminate extra context parameter for pin/unpin

* fix linting

* fix typo

* disable new format materialization for pinned graphs

* update python doc for pin_memory_

* fix unit test

* update doc

* change unitgraph and heterograph's PinMemory to in-place

* update comments for NDArray's PinMemory_ and PinData

* update doc
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

40b44a43

19 Jan, 2022 1 commit

[Fix] reduce error msg, refine fetch logic of available ports (#3658) · e4cb4a37

Rhett Ying authored Jan 19, 2022

* [Fix] reduce error msg, refine fetch logic of available ports

* un-initialize client before sending shutdown request

* fix import error

* print connect failure log only in debug mode

* enable DMLC_LOG_DEBUG=1 in CI

e4cb4a37

17 Jan, 2022 2 commits
- [Bugfix] Fixes the redundancy parameter being used wrong in global negative sampling (#3657) · 77f4287a
  Quan (Andy) Gan authored Jan 17, 2022
```
* oops

* test
```
  77f4287a
- [Bugfix] Fix GPU global negative sampling code (#3653) · 2aad1c0b
  Quan (Andy) Gan authored Jan 17, 2022
```
* fix GPU global negative sampling code

* Update negative_sampling.cu
```
  2aad1c0b
11 Jan, 2022 2 commits

Pass the std:min argument's type, to avoid the compilation error. (#3637) · b002f8f9

MaoYuan Xian authored Jan 11, 2022



* Pass the std:min argument's type, to avoid the compilation error.

* Update parallel_for.h

* Update negative_sampling.cc
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

b002f8f9

[Feature][Dist] change TP::Receiver/TP::Sender for multiple connections (#3574) · 37467e25

Rhett Ying authored Jan 11, 2022



* [Feature] enable TP::Receiver wait for any numbers of senders

* fix random unit test failure

* avoid endless future wait

* fix unit test failure

* fix seg fault when finalize wait in receiver

* [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests

* fix lint

* release RPCContext resources before process exits

* [Debug] TPReceiver wait start log

* [Debug] add log in get port

* [Debug] add log

* [ReDebug] revert time sleep in unit tests

* [Debug] remove sleep for test_distri,test_mp

* [debug] add more log

* [debug] add listen_booted_ flag

* [debug] restore commented code for queue

* [debug] sleep more in rpc_client

* restore change in tests

* Revert "restore change in tests"

This reverts commit 41a18926d181ec2517069389bfc41de2cc949280.

* Revert "[debug] sleep more in rpc_client"

This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67.

* Revert "[debug] restore commented code for queue"

This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301.

* Revert "[debug] add listen_booted_ flag"

This reverts commit 244b2167d94942ff2a0acec8823b974975e52580.

* Revert "[debug] add more log"

This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2.

* Revert "[Debug] remove sleep for test_distri,test_mp"

This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612.

* remove debug code

* revert unnecessary change

* revert unnecessary changes

* always reset RPCContext when get started and reset all data

* remove time.sleep in dist tests

* fix lint

* reset envs before each dist test

* reset env properly

* add time sleep when start each server

* sleep for a while when boot server

* replace wait_thread with callback

* fix lint

* add dglconnect handshake check
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

37467e25

10 Jan, 2022 1 commit
- disabling cuda11 apis (#3635) · c04b5bc7
  Quan (Andy) Gan authored Jan 10, 2022
  
  c04b5bc7
07 Jan, 2022 1 commit

[Feature] Negative sampling (#3599) · 90f10b31

Quan (Andy) Gan authored Jan 07, 2022

* first commit

* a bunch of fixes

* add unique

* lint

* lint

* lint

* address comments

* Update negative_sampler.py

* fix

* description

* address comments and fix

* fix

* replace unique with replace

* test pylint

* Update negative_sampler.py

90f10b31

04 Jan, 2022 1 commit
- [Windows] Support NDArray in shared memory on Windows (#3615) · b226fe01
  Quan (Andy) Gan authored Jan 04, 2022
```
* support shared memory on windows

* Update shared_mem.cc
```
  b226fe01
19 Dec, 2021 1 commit

Fix CopyVectorToNDArray in src/c_api_common.h (#3597) · 25538ba4

hirayaku authored Dec 19, 2021



* fix CopyVectorToNDArray

* Fix lint
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

25538ba4

16 Dec, 2021 1 commit

[Feature] Add CUDA support for `min` and `max` reducer in heterogeneous API... · 70a499e3

Israt Nisa authored Dec 16, 2021


[Feature] Add CUDA support for `min` and `max` reducer in heterogeneous API for unary message functions (#3566)

* CUDA support max/min reducer on forward pass

* docstring

* concised UpdateGradMinMax_hetero

* reorganized UpdateGradMinMax_hetero

* CUDA kernels for max/min reducer

* variable name

* lint check

* changed CUDA 2D thread mapping to 1D

* removed legacy cusparse for min/max reducer

* git CI issue

* restarting git CI

* adding namespace std
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

70a499e3

15 Dec, 2021 2 commits

[PinSAGESampler] support PinSAGE sampler on GPU (#3567) · dd762a1e

lixiaobai authored Dec 15, 2021



* Feat: support API "randomwalk_topk" in library

* Feat: use the new API "randomwalk_topk" for PinSAGESampler

* Minor

* Minor

* Refactor: modified codes as checker required

* Minor

* Minor

* Minor

* Minor

* Fix: checking errors in RandomWalkTopk

* Refactor: modified the docstring for randomwalk_topk

* change randomwalk_topk to internal

* fix

* rename

* Minor for pinsage.py

* Feat: support randomwalk and SelectPinSageNeighbors on GPU

Port RandomWalk algorithm on GPU,
and port SelectPinSageNeighbors on GPU.

* Feat: support GPU on python APIs

* Feat: remove perf print information in FrequenchHashmap

* Fix: modified the code format

Modified the code format as task_lint.sh suggested

* Feat: let test script support PinSAGESampler on GPU

Let test script support PinSAGESampler on GPU,
minor of "restart_prob".

* Minor

* Minor

* Minor

* Refactor: use the atomic operations from the array module

* Minor: change the long lines

* Refactor: modified the get_node_types for gpu

* Feat: update the contributor date

* Perf: remove unnecessary stream sync

* Feat: support other random walk

But the non-uniform choice is still not supported.

* Fix: add CUDA switch for random walk
Co-authored-by: Quan Gan <coin2028@hotmail.com>

dd762a1e

[DistGNN, Graph partitioning] Libra partition (#3376) · 78e0dae6

Vasimuddin Md authored Dec 15, 2021



* added distgnn plus libra codebase

* Dist application codes

* added comments in partition code. changed the interface of partitioning call.

* updated readme

* create libra partitioning branch for the PR

* removed disgnn files for first PR

* updated kernel.cc

* added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc

* fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script.

* removed libra2dgl.py

* fixed the lint error and cleaned the code.

* revisions due to PR comments. added distgnn/tools contains partitions routines

* update 2 PR revision I

* fixed errors; also improved the runtime by 10x.

* fixed minor lint error

* fixed some more lints

* PR revision II changed the interface of libra partition function

* rewrite docstring
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

78e0dae6

08 Dec, 2021 1 commit
- [Bugfix] Fix SetDevice issue for NeighborMatching (#3341) · d798280f
  Tianqi Zhang (张天启) authored Dec 08, 2021
```
* fix setdevice issue

* change to curand device API
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
```
  d798280f
06 Dec, 2021 2 commits

[RPC] Use tensorpipe for rpc communication (#3335) · a3ce780d

Jinjing Zhou authored Dec 06, 2021

* doesn't know whether works

* add change

* fix

* fix

* fix

* remove

* revert

* lint

* lint

* fix

* revert

* lint

* fix

* only build rpc on linux

* lint

* lint

* fix build on windows

* fix windows

* remove old test

* fix cmake

* Revert "remove old test"

This reverts commit f1ea75c777c34cdc1f08c0589676ba6aee1feb29.

* fix windows

* fix

* fix

* fix indent

* fix indent

* address comment

* fix

* fix

* fix

* fix

* fix

* lint

* fix indent

* fix lint

* add introduction

* fix

* lint

* lint

* add more logs

* fix

* update xbyak for C++14 with gcc5

* Remove channels

* fix

* add test script

* fix

* remove unused file

* fix lint

* add timeout

a3ce780d

[Distributed] Edge-type-specific fanouts for heterogeneous graphs (#3558) · eb08ef38
Quan (Andy) Gan authored Dec 06, 2021
```
* first commit

* second commit

* spaghetti unit tests

* rewrite test
```
eb08ef38

03 Dec, 2021 1 commit

[Feature] Add Min/max reducer in heterogeneous API for unary message functions (#3514) · cb0e1103

Israt Nisa authored Dec 03, 2021



* min/max support for forward CPU heterograph

* Added etype with each argU values

* scatter_add needs fix

* added scatter_add_hetero. Grads dont match for max reducer

* storing ntype in argX

* fixing scatter_add_hetero

* hetero matches with torch's scatter add

* works copy_e forward+cpu

* added backward for copy_rhs

* Computes gradient for all node types in one kernel

* bug fix

* unnitest for max/min on CPU

* renamed scatter_add_hetero to update_grad_minmax_hetero

* lint check and comment out cuda call for max. Code is for CPU only

* lint check

* replace inf with zero

* minor

* lint check

* removed LIBXSMM code from hetro code

* fixing backward operator of UpdateGradMinMaxHetero

* removed backward from update_grad_minmax_hetero

* docstring

* improved docstring and coding style

* Added pass by pointer for output

* typos and pass by references

* Support for copy_rhs

* Added header <string>

* fix bug in copy_u_max

* Added comments and dimension check of all etypes

* skip mxnet check

* pass by pointer output arrays

* updated docstring
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

cb0e1103

30 Nov, 2021 1 commit

[Performance][GPU] Improve csr2coo.cu:_RepeatKernel() for more robust GPU usage (#3537) · 66a54555

ayasar70 authored Nov 30, 2021



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* working on repeat

* updating repeat kernel

* removing unnecessary parameter

* cleaning commented line

* cleaning time measures

* cleaning time measurement lines
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

66a54555

29 Nov, 2021 1 commit

[PinSAGE samper] Adjust the APIs for PinSAGESamper (#3529) · 44f0b5fe

lixiaobai authored Nov 30, 2021



* Feat: support API "randomwalk_topk" in library

* Feat: use the new API "randomwalk_topk" for PinSAGESampler

* Minor

* Minor

* Refactor: modified codes as checker required

* Minor

* Minor

* Minor

* Minor

* Fix: checking errors in RandomWalkTopk

* Refactor: modified the docstring for randomwalk_topk

* change randomwalk_topk to internal

* fix

* rename

* Minor for pinsage.py
Co-authored-by: Quan Gan <coin2028@hotmail.com>

44f0b5fe

17 Nov, 2021 1 commit

[Feature] Added heterograph support to SDDMM_COO and clean up SpMM and SDDMM hetero kernels (#3449) · 2150fcaf

Israt Nisa authored Nov 17, 2021



* Added SDDMMCOO_hetero support

* removed redundant CUDA kernels

* added benchmark for regression test

* fix

* fixed bug for single src node type
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

2150fcaf

15 Nov, 2021 1 commit
- [Randomwalk] Fix off-by-one bug in GenericRandomWalk() (#3500) · 2e8b56a3
  Eric Kim authored Nov 15, 2021
  
  2e8b56a3
10 Nov, 2021 1 commit
- [BugFix] fix in_degree/out_degree computation logic (#3477) · ea8b93f9
  Rhett Ying authored Nov 10, 2021
```
* [BugFix] fix in/out degree computation

* add unit tests
```
  ea8b93f9
06 Nov, 2021 1 commit

[Performance][GPU] Improve _SegmentCopyKernel() (#3470) · 96cd2ee6

ayasar70 authored Nov 06, 2021



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

96cd2ee6

04 Nov, 2021 2 commits

[BugFix] Fix bugs in GPU sampling and enable unit tests for dataloaders on the GPU (#3474) · b717c8bf

Xin Yao authored Nov 05, 2021



* enable unit tests for dataloader on the GPU

* fix compatibility

* copyright

* fix linting
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

b717c8bf

[Feature] aten::Relabel_() for the GPU (#3445) · d3ae7544

Xin Yao authored Nov 04, 2021



* relabel gpu

* unittest for ralebl_ on the GPU

* finish Relabel_ for the GPU

* copyright

* re-enable the unittest for edge_subgrah on the GPU

* fix unittest for tensorflow

* use a fixed number of threads
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

d3ae7544

03 Nov, 2021 1 commit
- Update cub for cuda 11.5 compatibility (#3468) · f5102145
  nv-dlasalle authored Nov 02, 2021
  
  f5102145
21 Oct, 2021 1 commit

[Sampling] Implement dgl.compact_graphs() for the GPU (#3423) · a8c81018

Xin Yao authored Oct 21, 2021

* gpu compact graph template

* cuda compact graph draft

* fix typo

* compact graphs

* pass unit test but fail in training

* example using EdgeDataLoader on the GPU

* refactor cuda_compact_graph and cuda_to_block

* update training scripts

* fix linting

* fix linting

* fix exclude_edges for the GPU

* add --data-cpu & fix copyright

a8c81018

18 Oct, 2021 2 commits

[Fix] Split nccl sparse push into two groups (#3404) · c560040f
nv-dlasalle authored Oct 18, 2021

c560040f

[Peformance] Parallelize CSRSliceRows() (#3409) · aa11aaa4

David Min authored Oct 18, 2021



* parallelize CSRRowSlice()

* use parallel_for for the second loop
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

aa11aaa4

15 Oct, 2021 1 commit

[Bugfix] Add UVM specialized IndexSelect kernels which perform boundary checks (#3293) · 4f5c3aa2

David Min authored Oct 15, 2021



* Add pytorch-direct version

* remove

* add documentation for UnifiedTensor

* Revert "add documentation for UnifiedTensor"

This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849.

* add boundary check for UVM IndexSelect

* relocate boundary check index kernels to cuda

* fix function name

* fix indexkernel in nccl api

* fix argument ordering

* simplify code

* Add a comment for the uvm version
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

4f5c3aa2

14 Oct, 2021 1 commit

[Bugfix] three bugs related to using DGL as a subdirectory(third_party) of another project. (#3379) · 18863069

zexi yuan authored Oct 14, 2021

* [Bugfix] fix a compile error for Debug-BuildType on Windows Platform

When using CMakeLists.txt to build the "Debug" BuildType on the Windows Platform, it has three compile errors (C4716) in the file "dgl\src\runtime\shared_mem.cc":

'dgl::runtime::SharedMemory::CreateNew': must return a value
'dgl::runtime::SharedMemory::Open': must return a value
'dgl::runtime::SharedMemory::Exist': must return a value

* [Bugfix] cmake error "cannot find load file" when DGL as a sub_directory on Linux

When using DGL as a subdirectory in a CMake Project, the "CMAKE_SOURCE_DIR" here will return the parent cmake scope dir, which is not a expected dir.
Maybe it is better to use "CMAKE_CURRENT_SOURCE_DIR" to set "GKLIB_PATH".

* [Bugfix] cmd cmake error when DGL as a subdirectory

When DGL as a subdirectory of another project, the WORKING_DIRECTORY of "add_custom_command" will be incorrect at the line 255 of "CMakeLists.txt", such that making a cmake "setlocal" error.

18863069

12 Oct, 2021 1 commit
- [Bug] check dtype before convert to gk (#3414) · 2d88db5a
  Rhett Ying authored Oct 12, 2021
  
  2d88db5a
29 Sep, 2021 1 commit

[Feature] enable create/set/free cuda stream for internal use (#3334) · e234fcfa

Rhett Ying authored Sep 29, 2021

* [Feature] enable create/set/free cuda stream for internal use

* add unit test

* fix unit test failure on mxnet and tf

* refactor stream wrapper

* fix lint error

* fix lint error

e234fcfa

28 Sep, 2021 1 commit
- [Feature] Implement one thread multiple socket (#3200) · 5cf48fc6
  Jingcheng Yu authored Sep 28, 2021
```
Co-authored-by: JingchengYu94 <jingchengyu94@gmail.com>
```
  5cf48fc6
22 Sep, 2021 1 commit
- [Feature] Graceful handling of exceptions thrown within OpenMP blocks (#3353) · a04a8d06
  Quan (Andy) Gan authored Sep 22, 2021
```
* graceful c++ exception in OpenMP

* credits

* add test
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
```
  a04a8d06
21 Sep, 2021 1 commit

[Feature] Exclude edges in sample_neighbors (#2971) · bc14829f

mszarma authored Sep 21, 2021



* [Feature] Exclude edges in sample_neighbors

Extending sample_neighbors and sample_frontier
API to support exclude_edges parameter.

exclude_edges support tensor and dict data
Feature enable excluding certain edges
during neighborhood sampling
Exclude_edges contains EID's of edges
which will be excluded
during neighbor picking for seed nodes.

Added test case for heterograph and homograph
RFC issue id: 2944

* compatibility

* fix

* fix
Co-authored-by: Quan Gan <coin2028@hotmail.com>

bc14829f

17 Sep, 2021 1 commit
- [BugFix] initialize data if null when converting from row sorted coo to csr (#3360) · bacc9047
  Rhett Ying authored Sep 17, 2021
  
  bacc9047
16 Sep, 2021 1 commit

[Performance][Feature] Add `src_nodes` paramter to `to_block()` to avoid cost... · 2647afc9

nv-dlasalle authored Sep 15, 2021


[Performance][Feature] Add `src_nodes` paramter to `to_block()` to avoid cost running unique() when available. (#2973)

* Add lhs_nodes are paremeter to to_block

* Update unit test

* Switch to simplified node conversion

* Switch lhs_nodes to be in/out parameter

* Update docs
Co-authored-by: Da Zheng <zhengda1936@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

2647afc9

14 Sep, 2021 1 commit

[Performance] improve coo2csr space complexity when row is not sorted (#3326) · f4c79f7f

Rhett Ying authored Sep 14, 2021



* [Performance] improve coo2csr space complexity when row is not sorted

* [Perf] replace std::vector<> by NDArray

* keep both impl of unsorted coo to csr and choose according to graph density dynamically

* refine criteria to choose btw Unsorted algos
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-27.us-west-2.compute.internal>

f4c79f7f