Commits · 70a499e388d77292e859533dad3968525f1a6cdb · OpenDAS / dgl

"...python/git@developer.sourcefind.cn:change/sglang.git" did not exist on "f556ac8bd8f6cfad85ce4da6d6b10c775cb43278"

16 Dec, 2021 1 commit

[Feature] Add CUDA support for `min` and `max` reducer in heterogeneous API... · 70a499e3

Israt Nisa authored Dec 16, 2021


[Feature] Add CUDA support for `min` and `max` reducer in heterogeneous API for unary message functions (#3566)

* CUDA support max/min reducer on forward pass

* docstring

* concised UpdateGradMinMax_hetero

* reorganized UpdateGradMinMax_hetero

* CUDA kernels for max/min reducer

* variable name

* lint check

* changed CUDA 2D thread mapping to 1D

* removed legacy cusparse for min/max reducer

* git CI issue

* restarting git CI

* adding namespace std
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

70a499e3

15 Dec, 2021 1 commit

[DistGNN, Graph partitioning] Libra partition (#3376) · 78e0dae6

Vasimuddin Md authored Dec 15, 2021



* added distgnn plus libra codebase

* Dist application codes

* added comments in partition code. changed the interface of partitioning call.

* updated readme

* create libra partitioning branch for the PR

* removed disgnn files for first PR

* updated kernel.cc

* added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc

* fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script.

* removed libra2dgl.py

* fixed the lint error and cleaned the code.

* revisions due to PR comments. added distgnn/tools contains partitions routines

* update 2 PR revision I

* fixed errors; also improved the runtime by 10x.

* fixed minor lint error

* fixed some more lints

* PR revision II changed the interface of libra partition function

* rewrite docstring
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

78e0dae6

06 Dec, 2021 1 commit
- [Distributed] Edge-type-specific fanouts for heterogeneous graphs (#3558) · eb08ef38
  Quan (Andy) Gan authored Dec 06, 2021
```
* first commit

* second commit

* spaghetti unit tests

* rewrite test
```
  eb08ef38
03 Dec, 2021 1 commit

[Feature] Add Min/max reducer in heterogeneous API for unary message functions (#3514) · cb0e1103

Israt Nisa authored Dec 03, 2021



* min/max support for forward CPU heterograph

* Added etype with each argU values

* scatter_add needs fix

* added scatter_add_hetero. Grads dont match for max reducer

* storing ntype in argX

* fixing scatter_add_hetero

* hetero matches with torch's scatter add

* works copy_e forward+cpu

* added backward for copy_rhs

* Computes gradient for all node types in one kernel

* bug fix

* unnitest for max/min on CPU

* renamed scatter_add_hetero to update_grad_minmax_hetero

* lint check and comment out cuda call for max. Code is for CPU only

* lint check

* replace inf with zero

* minor

* lint check

* removed LIBXSMM code from hetro code

* fixing backward operator of UpdateGradMinMaxHetero

* removed backward from update_grad_minmax_hetero

* docstring

* improved docstring and coding style

* Added pass by pointer for output

* typos and pass by references

* Support for copy_rhs

* Added header <string>

* fix bug in copy_u_max

* Added comments and dimension check of all etypes

* skip mxnet check

* pass by pointer output arrays

* updated docstring
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

cb0e1103

30 Nov, 2021 1 commit

[Performance][GPU] Improve csr2coo.cu:_RepeatKernel() for more robust GPU usage (#3537) · 66a54555

ayasar70 authored Nov 30, 2021



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases

* working on repeat

* updating repeat kernel

* removing unnecessary parameter

* cleaning commented line

* cleaning time measures

* cleaning time measurement lines
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

66a54555

17 Nov, 2021 1 commit

[Feature] Added heterograph support to SDDMM_COO and clean up SpMM and SDDMM hetero kernels (#3449) · 2150fcaf

Israt Nisa authored Nov 17, 2021



* Added SDDMMCOO_hetero support

* removed redundant CUDA kernels

* added benchmark for regression test

* fix

* fixed bug for single src node type
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

2150fcaf

06 Nov, 2021 1 commit

[Performance][GPU] Improve _SegmentCopyKernel() (#3470) · 96cd2ee6

ayasar70 authored Nov 06, 2021



* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment

* fixing lint issues

* Update cub for cuda 11.5 compatibility (#3468)

* fixing type mismatch

* tx guaranteed to be smaller than nnz. Hence removing last check

* minor: updating comment

* adding three unit tests for csr slice method to cover some corner cases
Co-authored-by: Abdurrahman Yasar <ayasar@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

96cd2ee6

04 Nov, 2021 1 commit

[Feature] aten::Relabel_() for the GPU (#3445) · d3ae7544

Xin Yao authored Nov 04, 2021



* relabel gpu

* unittest for ralebl_ on the GPU

* finish Relabel_ for the GPU

* copyright

* re-enable the unittest for edge_subgrah on the GPU

* fix unittest for tensorflow

* use a fixed number of threads
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

d3ae7544

03 Nov, 2021 1 commit
- Update cub for cuda 11.5 compatibility (#3468) · f5102145
  nv-dlasalle authored Nov 02, 2021
  
  f5102145
18 Oct, 2021 1 commit

[Peformance] Parallelize CSRSliceRows() (#3409) · aa11aaa4

David Min authored Oct 18, 2021



* parallelize CSRRowSlice()

* use parallel_for for the second loop
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

aa11aaa4

15 Oct, 2021 1 commit

[Bugfix] Add UVM specialized IndexSelect kernels which perform boundary checks (#3293) · 4f5c3aa2

David Min authored Oct 15, 2021



* Add pytorch-direct version

* remove

* add documentation for UnifiedTensor

* Revert "add documentation for UnifiedTensor"

This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849.

* add boundary check for UVM IndexSelect

* relocate boundary check index kernels to cuda

* fix function name

* fix indexkernel in nccl api

* fix argument ordering

* simplify code

* Add a comment for the uvm version
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

4f5c3aa2

17 Sep, 2021 1 commit
- [BugFix] initialize data if null when converting from row sorted coo to csr (#3360) · bacc9047
  Rhett Ying authored Sep 17, 2021
  
  bacc9047
14 Sep, 2021 1 commit

[Performance] improve coo2csr space complexity when row is not sorted (#3326) · f4c79f7f

Rhett Ying authored Sep 14, 2021



* [Performance] improve coo2csr space complexity when row is not sorted

* [Perf] replace std::vector<> by NDArray

* keep both impl of unsorted coo to csr and choose according to graph density dynamically

* refine criteria to choose btw Unsorted algos
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-27.us-west-2.compute.internal>

f4c79f7f

13 Sep, 2021 2 commits
- Fixes bug #3312 (#3345) · 983a4fdd
  sanchit-misra authored Sep 13, 2021
```
* Fixes bug #3312

* Fixing lint errors
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
```
  983a4fdd
- Fix openmp header (#3325) · e7ea0f53
  Quan (Andy) Gan authored Sep 13, 2021
  
  e7ea0f53
07 Sep, 2021 1 commit

[Feature] Support builtin binary message function for heterogenenous graph (#3273) · 298e4fa6

Israt Nisa authored Sep 07, 2021



* Added binary builtinMsgFunc forward() for heterograph

* Added backward for u_op_v

* Supports all binary builtin forward

* Supports binary message funcs with reduce func sum

* lint check

* removed import torch from unittest

* enabled GPU test

* lint check

* Fixed docstrings

* rename func get_hs_id

* edited comment
Co-authored-by: Israt Nisa <nisisrat@amazon.com>

298e4fa6

06 Sep, 2021 1 commit
- Remove deprecated kernels (#3316) · c81efdf2
  Jinjing Zhou authored Sep 06, 2021
```
* remove

* remove

* fix

* remove

* remove
```
  c81efdf2
02 Sep, 2021 1 commit

[Performance, CPU] Rewriting OpenMP pragmas into parallel_for (#3171) · f5183820

Tomasz Patejko authored Sep 02, 2021

* [CPU, Parallel] Rewriting omp pragmas with parallel_for

* [CPU, Parallel] Decrease number of calls to task function

* c[CPU, Parallel] Modify calls to new interface of parallel_for

f5183820

01 Sep, 2021 1 commit

[Feature] Add a HINT for the per edge type sampler of heterogeneous DistGraph... · f4fe518f

xiang song(charlie.song) authored Sep 01, 2021


[Feature] Add a HINT for the per edge type sampler of heterogeneous DistGraph that highlighting the etypes are sorted already. (#3260)

* pass cpp test

* distgraph use sorted edge flag.

* lint

* triger

* update test
Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>

f4fe518f

31 Aug, 2021 1 commit

[CPU][Sampling][Performance] Improve sampling on the CPU. (#3274) · 8e525dad

nv-dlasalle authored Aug 31, 2021



* Optimize sampling

* Stop initialization of array

* Fix includes for linting

* Move comment

* Fix replace
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

8e525dad

24 Aug, 2021 1 commit
- fix (#3286) · 85b8fe52
  Quan (Andy) Gan authored Aug 24, 2021
  
  85b8fe52
19 Aug, 2021 1 commit

[Performance][Feature] Implement edge excluding in EdgeDataLoader on GPU (#3226) · f6349508

nv-dlasalle authored Aug 19, 2021



* Update filter code

* Add unit tests

* Fixes

* Switch to indices

* Rename functions

* Fix linting

* Fix whitespace

* Add doc

* Fix heterograph

* Change workspace allocation

* Fix linting

* Fix docs in filter.py

* Add todo
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

f6349508

18 Aug, 2021 1 commit
- fix cuda 11.1 crashing bug (#3265) · a536772e
  Quan (Andy) Gan authored Aug 18, 2021
  
  a536772e
17 Aug, 2021 1 commit

[Performance] Cacheline-aligned access for UnifiedTensor (#3254) · 2613f7f0

David Min authored Aug 17, 2021



* Add pytorch-direct version

* remove

* add documentation for UnifiedTensor

* Revert "add documentation for UnifiedTensor"

This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849.

* alignment fix for UnifiedTensor access

* fix linting issue
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

2613f7f0

02 Aug, 2021 1 commit

[bugfix] Fix curand_init() calls in rowwise sampling (#3196) · f7ce2671

nv-dlasalle authored Aug 02, 2021



* Split out separate generators for each thread

* Amortize cost of curand_init

* Improve readability
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

f7ce2671

28 Jul, 2021 1 commit

[New Feature] Per edge type sampler for to_homogeneous graphs. (#3131) · ba7e7cf9

xiang song(charlie.song) authored Jul 28, 2021



* fix.

* fix.

* fix.

* fix.

* Fix test

* Deprecate old DistEmbedding impl, use synchronized embedding impl

* Basic imple of heterogeneous on homogenenous sampling

* make pass

* Pass C++ test

* Add python test code

* lint

* lint

* Add MultiLayerEtypeNeighborSampler

* Add unitest for single machine dataloader

* Add dist dataloader test for edge type sampler

* Fix lint

* fix

* support for per etype sample

* Fix some bug and enable distributed training with per edge sample

* fix

* Now distributed training works

* turn off some mxnet

* turn off mxnet for some dist test

* fix

* upd

* upd according to the comments

* Fix

* Fix test and now distributed works.

* upd

* upd

* Fix

* Fix bug

* remove dead code.

* upd

* Fix

* upd

* Fix
Co-authored-by: Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

ba7e7cf9

21 Jul, 2021 1 commit
- Remove redundant fill in SPMM kernel (#3166) · de174ada
  Jinjing Zhou authored Jul 21, 2021
```
* remove redundant fill

* trigger ci
```
  de174ada
16 Jul, 2021 1 commit

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy... · 905c0aa5

David Min authored Jul 17, 2021

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy host memory access from GPU (#3086)

* Add pytorch-direct version

* Initial commit of unified tensor

* Merge branch 'master' of https://github.com/davidmin7/dgl



* Remove unnecessary things

* Fix error message

* Fix/Add descriptions

* whitespace fix

* add unpin

* disable IndexSelectCPUFromGPU with no CUDA

* add a newline for unified_tensor.py

* Apply changes based on feedback

* add 'os' module

* skip unified tensor unit test for cpu only

* Update tests/pytorch/test_unified_tensor.py
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

* reflect feedback
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

905c0aa5

13 Jul, 2021 1 commit

[CPU][Kernel] Single socket spmm (#3024) · fac75e16

sanchit-misra authored Jul 13, 2021



* optimizations of spmm for CPU

* Added names of contributors

* Minor code cleanup

* Moved the spmm optimization code to a new header file

* Moved to DGL's logging method

* removed duplicate code between SpMMSumCsr and SpMMCmpCsr

* Changes made to follow Google coding style

* Fixed lint errors in spmm.h

* Fixed some lint errors from spmm_blocking_libxsmm.h

* Fixed lint errors from spmm_blocking_libxsmm.h

* Added comments to SpMMCreateLibxsmmKernel

* to enable building of tests, and other cosmetic changes

* disabling libxsmm on windows

* Put a condition to avoid opt impl for FP64 as libxsmm does not have FP64 support yet

* cosmetic changes and documentation

* cosmetic changes

* to pass lint tests

* replaced multiple allocations for buffers of indices and edges with a single allocation
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

fac75e16

08 Jul, 2021 2 commits
- fix cub problem (#3121) · 183d29de
  Jinjing Zhou authored Jul 08, 2021
  
  183d29de
- [Build] fix various build problems (#3117) · 2f41fcd9
  Quan (Andy) Gan authored Jul 08, 2021
  
  2f41fcd9
06 Jul, 2021 1 commit

[Feature] Add Heterograph support on Python for builtin unary msg functions... · 188152b8

Israt Nisa authored Jul 06, 2021


[Feature] Add Heterograph support on Python for builtin unary msg functions (copy_u, copy_e) (#2989)

* heterograph for binary func

* Added SDDMM support

* Added unittest

* added binary test cases

* unary mfuncs works

* Fixed lint err

* lint check and others

* link check

* fixed import *_hetero issue

* lint check

* replace torch with dgl backend

* lint cehck

* removed torch from test

* skip mxnet unittest

* skip gpu test

* Remove unused/duplicated code

* minor

* changed data structure of ndata and edata

* link check

* reorganized

* minor lint

* minor lint

* raise error for udf func

* lint check

* fix for CUDA 10.1

* add a note for future cross-type max/min reducing

* Add support CUDA < 11

* lint check

* tidied C code

* remove dummy GSDDMM_hetero backward implementation
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: Quan Gan <coin2028@hotmail.com>

188152b8

23 Jun, 2021 1 commit

[Feature] Biased Neighbor Sampling (#2987) · e56bbafd

Qidong Su authored Jun 23, 2021



* update

* update

* update

* update

* lint

* lint

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* lint

* update

* clone

* update

* update

* update

* update

* replace idarray with ndarray

* refactor cpp part

* refactor python part

* debug

* refactor interface

* test and doc

* lint and test

* lint

* fix

* fix

* fix

* const

* doc

* fix

* fix

* fix

* fix

* fix & doc

* fix

* fix

* update

* update

* update

* merge

* doc

* doc

* lint

* fix

* more tests

* doc

* fix

* fix

* update

* update

* update

* fix

* fix
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

e56bbafd

22 Jun, 2021 1 commit

[Kernel] Add heterograph support in CUDA kernels (SpMM, SDDMM) (#2925) · 1113f674

Israt Nisa authored Jun 21, 2021



* Added heterograph support SpMM, SDDMM

* bug fix cuda stream

* add cudaStrm destroy and fix whitespace

* Added heterograph support SpMM, SDDMM

* bug fix cuda stream

* add cudaStrm destroy and fix whitespace

* changed max stream = 1

* Fixed ctx

* using default stream

* Added heterograph support SpMM, SDDMM

* bug fix cuda stream

* add cudaStrm destroy and fix whitespace

* changed max stream = 1

* Fixed ctx

* using default stream

* fix bug in copy_rhs

* changed by mistake

* minor datatype change

* added datatype check
Co-authored-by: Israt Nisa <nisisrat@amazon.com>

1113f674

10 Jun, 2021 1 commit

[Kernel] Slicing Batched Graphs (#2965) · 5be937a7

Mufei Li authored Jun 10, 2021



* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Update

* Update

* Add files via upload

* Add files via upload

* Update

* Lint

* Add files via upload

* Lint

* Update

* Update

* Update

* Update

* Update

* Lint Fix

* Lint
Co-authored-by: Ubuntu <ubuntu@ip-172-31-12-161.us-west-2.compute.internal>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

5be937a7

03 Jun, 2021 1 commit

Add heterograph support in C kernels (#2882) · 75ec5826

Israt Nisa authored Jun 03, 2021



* SpMM for heterograph

* C APIs SDDMM heterograph

* passes initial result

* renamed eid with nid

* aggregation on same ntype for multiple etypes

* fix link check failure

* lint check part 2

* lint check part 3

* Fixed SpMMCmpCsr Min op

* added mem references

* fixed fill(Max/Min), added const

* removed newline

* brought back docstring
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

75ec5826

01 Jun, 2021 1 commit

[Feature][Sampler] Sort CSR by tag (#1664) · b8fe2b48

Qidong Su authored Jun 01, 2021



* update

* update

* update

* update

* lint

* lint

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* lint

* update

* clone

* update

* update

* update

* update

* replace idarray with ndarray

* refactor cpp part

* refactor python part

* debug

* refactor interface

* test and doc

* lint and test

* lint

* fix

* fix

* fix

* const

* doc

* fix

* fix

* fix

* fix

* fix & doc

* fix

* fix

* fix

* fix

* fix

* fix

* update
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

b8fe2b48

20 May, 2021 1 commit

[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings... · ae8dbe6d

nv-dlasalle authored May 20, 2021


[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)

* Split NCCL wrapper from sparse optimizer and sparse embedding

* Add more unit tests for single node nccl

* Fix unit test for tf

* Switch to device histogram

* Fix histgram issues

* Finish migration to histogram

* Handle cases with zero send/recieve data

* Start on partition object

* Get compiling

* Updates

* Add unit tests

* Switch to partition object

* Fix linting issues

* Rename partition file

* Add python doc

* Fix python assert and finish doxygen comments

* Remove stubs for range based partition to satisfy pylint

* Wrap unit test in GPU only

* Wrap explicit cuda call in ifdef

* Merge with partition.py

* update docstrings

* Cleanup partition_op

* Add Workspace object

* Switch to using workspace object

* Move last remainder based function out of nccl_api

* Add error messages

* Update docs with examples

* Fix linting erros
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

ae8dbe6d

17 May, 2021 1 commit

[Feature] Python interface for adjacency matrix summation and multiplication (#2893) · 657c220d

Quan (Andy) Gan authored May 17, 2021

* test commit

* fixes

* oops

* add docs

* lint

* why does it say I have a trailing whitespace

* oh ok

* fixes

* why there's an invalid argument error

* address comments

* fix

* address comments

657c220d

28 Apr, 2021 1 commit

Fix cu11 compile (#2879) · 703d4b93

xiang song(charlie.song) authored Apr 28, 2021


Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-191.ec2.internal>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

703d4b93