- 16 May, 2022 1 commit
-
-
Xin Yao authored
* remove unnecessary induced vertices in EdgeSubgraph * add unit test
-
- 12 May, 2022 1 commit
-
-
nv-dlasalle authored
-
- 11 May, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] Enable maximum try times for socket backend via DGL_DIST_MAX_TRY_TIMES * reset env before/after test * print log for info when trying to connect * fix * print log in python instead of cpp
-
- 27 Apr, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable socket net_type for rpc * fix lint * fix lint * fix build issue on windows * fix test failure on windows * fix test failure * fix cpp unit test failure * net_type blocking max_try_times * fix other comments * fix lint * fix comment * fix lint * fix cpp
-
- 26 Apr, 2022 1 commit
-
-
ayasar70 authored
* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment * fixing lint issues * Update cub for cuda 11.5 compatibility (#3468) * fixing type mismatch * tx guaranteed to be smaller than nnz. Hence removing last check * minor: updating comment * adding three unit tests for csr slice method to cover some corner cases * timing repeatkernel * clean * clean * clean * updating _SegmentMaskColKernel * Working on requests: removing sorted array check and adding comments to utility functions * fixing lint issue * Optimizing disjoint union kernel * Trying to resolve compilation issue on CI * [EMPTY] Relevant commit message here * applying revision requests on cpu/disjoint_union.cc * removing unnecessary casts * remove extra space Co-authored-by:
Abdurrahman Yasar <ayasar@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 12 Apr, 2022 1 commit
-
-
Quan (Andy) Gan authored
* cleaned pl node classification example * conform to PL's method of updating the dataloader * update * lint * fix test * fix
-
- 11 Apr, 2022 1 commit
-
-
Xin Yao authored
* enable uva for pinsage sampler * unit test * modify some checks on the python side * remove legacy random walk code * update unit test * update unit test * fix unit test * adjust checks * move some checks to c++ * move max_nodes check to cuda kernel * fix ci for tf Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
-
- 09 Apr, 2022 1 commit
-
-
Rhett Ying authored
* [BugFix] record/restore pin status when pickle/unpickle * disable test on TF * set version as expected * unpin memory in test
-
- 05 Apr, 2022 1 commit
-
-
nv-dlasalle authored
[Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827) * Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. * Remove argmax * Fix rebase error * Add more documentation to example and simplify * Switch to name shared memory * Add comment about how training is distributed * Restore iteration count * fix munmap error reporting for better error messages Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 31 Mar, 2022 1 commit
-
-
Quan (Andy) Gan authored
* fix uva with partial node types * lint * skip tensorflow unit test
-
- 27 Mar, 2022 1 commit
-
-
Cheng Wan authored
* upd * upd * upd * upd * upd * fix OpenMP compatibility issues * typo * partition * misc * fix typo * num_parts=1 * import torch * long * print info * print info * print info * upd * remove debug code * revert partition.py * fix cut count * fix cut count * Revert "fix cut count" This reverts commit 10926b4fd48f45c8f1ddb58be7db6c22e653effd. * Revert "fix cut count" This reverts commit 76465283bef093a2b4209ad70dd15d2437b2ec8a. * type of deprecate * typo in deprecate info * fix typo * use cv for partitioning * CE * no message * revert * typo * add objtype * no message * fix bug * fix bug * fix bug * ? * semicolon * drop tensors * no message * backward * backward * max op * store X.shape * th * test * Revert "test" This reverts commit 92b3b2f64a3a1128590098fa03ce429c5466e6ce. * test * tolist * debug * to cuda * tuple * fix bug * remove X * no message * fix bug * workload balance * Revert "workload balance" This reverts commit d7f8e4a16ba2a7eabb4a9bb945523bfe6623e723. * reverse * Revert "reverse" This reverts commit 8a71cf25685aa7d889b9b8881b46f7a16b7d6e6d. * Revert "Revert "reverse"" This reverts commit 196b143932d5cf9813576ece7c990b63d322d063. * Revert "Revert "Revert "reverse""" This reverts commit cf9e89a07013582056e7cde235e51331aca7fa9c. * no message * Merge commit '5498cf05' # Conflicts: # python/dgl/distributed/partition.py * Revert "Merge commit '5498cf05 '" This reverts commit f79be2ad777897c7025b28308454cad81ad6bb27. * fix bug * third party * no message * try to avoid memory leak * try to avoid memory leak * avoid memory leak with no hope * Revert "avoid memory leak with no hope" This reverts commit c77befe9479f46758e744642f66dd209b50eef7d. * no message * Revert "no message" This reverts commit 478cb28fe25fb1002b2f1dc202bb9bdaad8b2a56. * del * Revert "del" This reverts commit 1b468e45ce646b400ff3ffa61a0b2da058b3bdfd. * no message * no message * Revert "no message" This reverts commit 92e4f5561ed42da0606618b2fff9f1ad5ed439d9. * third party * document * Update metis_partition.cc * Update metis_partition_hetero.cc * Update metis_partition_hetero.cc * Update partition.py * Update partition.py * Update partition.py Co-authored-by:
yzh119 <expye@outlook.com> Co-authored-by:
chwan-rice <54331508+chwan-rice@users.noreply.github.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Da Zheng <zhengda1936@gmail.com>
-
- 24 Mar, 2022 2 commits
-
-
Quan (Andy) Gan authored
* fix * remove setcxx methods * move pin flag to CSR and COO matrix Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
Rhett Ying authored
Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 10 Mar, 2022 1 commit
-
-
paoxiaode authored
* Change the curand_init parameter * Change the curand_init parameter * commit * commit Co-authored-by:nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
-
- 01 Mar, 2022 1 commit
-
-
Quan (Andy) Gan authored
* fix * explain * oops
-
- 28 Feb, 2022 2 commits
-
-
Quan (Andy) Gan authored
* split files * fix
-
Quan (Andy) Gan authored
* Update randomwalk_gpu.cu * Update randomwalk_gpu.cu
-
- 27 Feb, 2022 1 commit
-
-
Quan (Andy) Gan authored
* huuuuge update * remove * lint * lint * fix * what happened to nccl * update multi-gpu unsupervised graphsage example * replace most of the dgl.mp.process with torch.mp.spawn * update if condition for use_uva case * update user guide * address comments * incorporating suggestions from @jermainewang * oops * fix tutorial to pass CI * oops * fix again Co-authored-by:Xin Yao <xiny@nvidia.com>
-
- 23 Feb, 2022 2 commits
-
-
sanchit-misra authored
-
Minjie Wang authored
* WIP: TypedLinear and new RelGraphConv * wip * further simplify RGCN * a bunch of tweak for performance; add basic cpu support * update on segmm * wip: segment.cu * new backward kernel works * fix a bunch of bugs in kernel; leave idx_a for future * add nn test for typed_linear * rgcn nn test * bugfix in corner case; update RGCN README * doc * fix cpp lint * fix lint * fix ut * wip: hgtconv; presorted flag for rgcn * hgt code and ut; WIP: some fix on reorder graph * better typed linear init * fix ut * fix lint; add docstring
-
- 21 Feb, 2022 1 commit
-
-
Quan (Andy) Gan authored
* fixes * fix * more fixes * update * oops * lint? * temporarily revert - will fix in another PR * more fixes * skipping mxnet test * address comments * fix DDP * fix edge dataloader exclusion problems * stupid bug * fix * use_uvm option * fix * fixes * fixes * fixes * fixes * add evaluation for cluster gcn and ddp * stupid bug again * fixes * move sanity checks to only support DGLGraphs * pytorch lightning compatibility fixes * remove * poke * more fixes * fix * fix * disable test * docstrings * why is it getting a memory leak? * fix * update * updates and temporarily disable forkingpickler * update * fix? * fix? * oops * oops * fix * lint * huh * uh * update * fix * made it memory efficient * refine exclude interface * fix tutorial * fix tutorial * fix graph duplication in CPU dataloader workers * lint * lint * Revert "lint" This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9. * Revert "lint" This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2. * Revert "fix graph duplication in CPU dataloader workers" This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f. Co-authored-by:
xiny <xiny@nvidia.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 18 Feb, 2022 2 commits
-
-
ayasar70 authored
* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment * fixing lint issues * Update cub for cuda 11.5 compatibility (#3468) * fixing type mismatch * tx guaranteed to be smaller than nnz. Hence removing last check * minor: updating comment * adding three unit tests for csr slice method to cover some corner cases * timing repeatkernel * clean * clean * clean * updating _SegmentMaskColKernel * Working on requests: removing sorted array check and adding comments to utility functions * fixing lint issue Co-authored-by:
Abdurrahman Yasar <ayasar@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
Jinjing Zhou authored
* add * fix * fix * fix * fix * add * add * fix * fix * fix * new loader * fix * fix * fix for 3.6 * fix * add * add receipes and also some bug fixes * fix * fix * fix * fix receipies * allow AsNodeDataset to work on ogb * add ut * many fixes for nodepred-ns pipeline * receipe for nodepred-ns * Update enter/README.md Co-authored-by:
Zihao Ye <zihaoye.cs@gmail.com> * fix layers * fix * fix * fix * fix * fix multiple issues * fix for citation2 * fix comment * fix * fix * clean up * fix Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Minjie Wang <minjie.wang@nyu.edu> Co-authored-by:
Zihao Ye <zihaoye.cs@gmail.com>
-
- 15 Feb, 2022 1 commit
-
-
Israt Nisa authored
* init * init * working cublasGemm * benchmark high-mem/low-mem, err gather_mm output * cuda kernel for bmm like kernel * removed cpu copy for E_per_Rel * benchmark code from Minjie * fixed cublas results in gathermm sorted * use GPU shared mem in unsorted gather mm * minor * Added an optimal version of gather_mm_unsorted * lint * init gather_mm_scatter * cublas transpose added * fixed h_offset for multiple rel * backward unittest * cublas support to transpose W * adding missed file * forgot to add header file * lint * lint * cleanup * lint * docstring * lint * added unittest * lint * lint * unittest * changed err type * skip cpu test * skip CPU code * move in-len loop inside * lint * added check different dim length for B * w_per_len is optional now * moved gather_mm to pytorch/backend with backward support * removed a_/b_trans support * transpose op inside GEMM call * removed out alloc from API, changed W 2D to 3D * Added se_gather_mm, Separate API for sortedE * Fixed gather_mm (unsorted) user interface * unsorted gmm backward + separate CAPI for un/sorted A * typecast to float to support atomicAdd * lint typecast * lint * added gather_mm_scatter * minor * const * design changes * Added idx_a, idx_b support gmm_scatter * dgl doc * lint * adding gather_mm in ops * lint * lint * minor * removed benchmark files * minor * empty commit Co-authored-by:Israt Nisa <nisisrat@amazon.com>
-
- 11 Feb, 2022 1 commit
-
-
ranzhejiang authored
* [feature] edge softmax refact. * delete file * fix backward and cmake version * fix backward * format function * fix setting * refix * refix * refix * refix * refix * refix * refix * refix * refix * refix * refix * refix * add cuda kernel for backward and rename some function * add benchmark for edge_softmax * fix format * remove cuda_backwrd * fix code format and add comment for op on CPU * fix lint Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 09 Feb, 2022 1 commit
-
-
Xin Yao authored
* implement pin_memory/unpin_memory/is_pinned for dgl.graph * update python docstring * update c++ docstring * add test * fix the broken UnifiedTensor * XPU_SWITCH for kDLCPUPinned * a rough version ready for testing * eliminate extra context parameter for pin/unpin * update train_sampling * fix linting * fix typo * multi-gpu uva sampling case * disable new format materialization for pinned graphs * update python doc for pin_memory_ * fix unit test * UVA sampling for link prediction * dispatch most csr ops * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update doc * update examples * change unitgraph and heterograph's PinMemory to in-place * update examples for multi-gpu uva sampling * update doc * fix linting * fix cpu build * fix is_pinned for DistGraph * fix is_pinned for DistGraph * update graphsage unsupervised example * update doc for gpu sampling * update some check for sampling device switching * fix linting * adapt for new dataloader * fix linting * fix * fix some name issue * adjust device check * add unit test for uva sampling & fix some zero_copy bug * fix linting * update num_threads in graphsage examples Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 26 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] long live server for multiple client groups * generate globally unique name for DistTensor within DGL automatically
-
- 21 Jan, 2022 1 commit
-
-
Xin Yao authored
* implement pin_memory/unpin_memory/is_pinned for dgl.graph * update python docstring * update c++ docstring * add test * fix the broken UnifiedTensor * eliminate extra context parameter for pin/unpin * fix linting * fix typo * disable new format materialization for pinned graphs * update python doc for pin_memory_ * fix unit test * update doc * change unitgraph and heterograph's PinMemory to in-place * update comments for NDArray's PinMemory_ and PinData * update doc Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 19 Jan, 2022 1 commit
-
-
Rhett Ying authored
* [Fix] reduce error msg, refine fetch logic of available ports * un-initialize client before sending shutdown request * fix import error * print connect failure log only in debug mode * enable DMLC_LOG_DEBUG=1 in CI
-
- 17 Jan, 2022 2 commits
-
-
Quan (Andy) Gan authored
* oops * test
-
Quan (Andy) Gan authored
* fix GPU global negative sampling code * Update negative_sampling.cu
-
- 11 Jan, 2022 2 commits
-
-
MaoYuan Xian authored
* Pass the std:min argument's type, to avoid the compilation error. * Update parallel_for.h * Update negative_sampling.cc Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
Rhett Ying authored
* [Feature] enable TP::Receiver wait for any numbers of senders * fix random unit test failure * avoid endless future wait * fix unit test failure * fix seg fault when finalize wait in receiver * [Feature] refactor sender connect logic and remove unnecessary sleeps in unit tests * fix lint * release RPCContext resources before process exits * [Debug] TPReceiver wait start log * [Debug] add log in get port * [Debug] add log * [ReDebug] revert time sleep in unit tests * [Debug] remove sleep for test_distri,test_mp * [debug] add more log * [debug] add listen_booted_ flag * [debug] restore commented code for queue * [debug] sleep more in rpc_client * restore change in tests * Revert "restore change in tests" This reverts commit 41a18926d181ec2517069389bfc41de2cc949280. * Revert "[debug] sleep more in rpc_client" This reverts commit a908e758eabca0a6ce62eb2e59baea02a840ac67. * Revert "[debug] restore commented code for queue" This reverts commit d3f993b3746e6bb6e2cc2f90204dd7e9461c6301. * Revert "[debug] add listen_booted_ flag" This reverts commit 244b2167d94942ff2a0acec8823b974975e52580. * Revert "[debug] add more log" This reverts commit 4b78447b0a575a824821dc7e25cca2246e6e30e2. * Revert "[Debug] remove sleep for test_distri,test_mp" This reverts commit e1df1aadcc8b1c2a0013ed77322ac391a8807612. * remove debug code * revert unnecessary change * revert unnecessary changes * always reset RPCContext when get started and reset all data * remove time.sleep in dist tests * fix lint * reset envs before each dist test * reset env properly * add time sleep when start each server * sleep for a while when boot server * replace wait_thread with callback * fix lint * add dglconnect handshake check Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 10 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
-
- 07 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* first commit * a bunch of fixes * add unique * lint * lint * lint * address comments * Update negative_sampler.py * fix * description * address comments and fix * fix * replace unique with replace * test pylint * Update negative_sampler.py
-
- 04 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* support shared memory on windows * Update shared_mem.cc
-
- 19 Dec, 2021 1 commit
-
-
hirayaku authored
* fix CopyVectorToNDArray * Fix lint Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 16 Dec, 2021 1 commit
-
-
Israt Nisa authored
[Feature] Add CUDA support for `min` and `max` reducer in heterogeneous API for unary message functions (#3566) * CUDA support max/min reducer on forward pass * docstring * concised UpdateGradMinMax_hetero * reorganized UpdateGradMinMax_hetero * CUDA kernels for max/min reducer * variable name * lint check * changed CUDA 2D thread mapping to 1D * removed legacy cusparse for min/max reducer * git CI issue * restarting git CI * adding namespace std Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 15 Dec, 2021 2 commits
-
-
lixiaobai authored
* Feat: support API "randomwalk_topk" in library * Feat: use the new API "randomwalk_topk" for PinSAGESampler * Minor * Minor * Refactor: modified codes as checker required * Minor * Minor * Minor * Minor * Fix: checking errors in RandomWalkTopk * Refactor: modified the docstring for randomwalk_topk * change randomwalk_topk to internal * fix * rename * Minor for pinsage.py * Feat: support randomwalk and SelectPinSageNeighbors on GPU Port RandomWalk algorithm on GPU, and port SelectPinSageNeighbors on GPU. * Feat: support GPU on python APIs * Feat: remove perf print information in FrequenchHashmap * Fix: modified the code format Modified the code format as task_lint.sh suggested * Feat: let test script support PinSAGESampler on GPU Let test script support PinSAGESampler on GPU, minor of "restart_prob". * Minor * Minor * Minor * Refactor: use the atomic operations from the array module * Minor: change the long lines * Refactor: modified the get_node_types for gpu * Feat: update the contributor date * Perf: remove unnecessary stream sync * Feat: support other random walk But the non-uniform choice is still not supported. * Fix: add CUDA switch for random walk Co-authored-by:Quan Gan <coin2028@hotmail.com>
-
Vasimuddin Md authored
* added distgnn plus libra codebase * Dist application codes * added comments in partition code. changed the interface of partitioning call. * updated readme * create libra partitioning branch for the PR * removed disgnn files for first PR * updated kernel.cc * added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc * fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script. * removed libra2dgl.py * fixed the lint error and cleaned the code. * revisions due to PR comments. added distgnn/tools contains partitions routines * update 2 PR revision I * fixed errors; also improved the runtime by 10x. * fixed minor lint error * fixed some more lints * PR revision II changed the interface of libra partition function * rewrite docstring Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-