- 23 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 18 Aug, 2022 1 commit
-
-
Daniil Sizov authored
* Add helper method for temporary affinitization of compute threads * Rework DL affinitization as single helper * Add example usage in benchmarks * Fix python linter warnings * Fix affinity helper params * Use NUMA node 0 cores only by default * Fix benchmarks * Fix lint errors Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 15 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 12 Aug, 2022 1 commit
-
-
Xin Yao authored
* Change CUDA_MAX_NUM_THREADS to 256 * change the configuration of grid
-
- 09 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 01 Aug, 2022 1 commit
-
-
Xin Yao authored
* enable use for weighted neighbor sampler and biased random walk * add unit tests * fix for mxnet/tf * fix typo
-
- 29 Jul, 2022 1 commit
-
-
Xin Yao authored
* add weighted sampling without replacement (A-Chao) * improve Algorithm A-Chao with block-wise prefix sum * correctly fill out_idxs * implement weighted sampling with replacement * small fix * merge host-side code of weighted/uniform sampling * enable unit tests for cuda weighted sampling * move thrust/cub wrapper to the cmake file * update docs accordingly * fix linting * fix linting * fix unit test * Bump external CUB/Thrust versions * Fix code style and update description of algorithm design * [Feature] GPU support weighted graph neighbor sampling commit by pengqirong(OPPO) * merge pengqirong's implementation * revert the change to cub and thrust * fix linting * use DeviceSegmentedSort for better performance * add more comments * add necessary notes * add necessary notes * resolve some comments * define THRUST_CUB_WRAPPED_NAMESPACE * fix doc Co-authored-by:彭齐荣 <657017034@qq.com>
-
- 27 Jul, 2022 1 commit
-
-
Rhett Ying authored
* [Log] fix confusing error log in TCPSocket::Bind() * fix lint
-
- 26 Jul, 2022 1 commit
-
-
Dewvin authored
* [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * fix empty prob array && enable non-uniform for restart && enable unit tests * update doc and guide for randomwalk and pinsage * update comments Co-authored-by:
zhenliangqiu <ubuntu@ip-172-31-24-245.ap-southeast-1.compute.internal> Co-authored-by:
xiny <xiny@nvidia.com>
-
- 15 Jul, 2022 1 commit
-
-
Quan (Andy) Gan authored
-
- 09 Jul, 2022 1 commit
-
-
Xin Yao authored
-
- 07 Jul, 2022 1 commit
-
-
Xin Yao authored
-
- 01 Jul, 2022 2 commits
-
-
Rhett Ying authored
-
Rhett Ying authored
* [Feature] extend sort_csr/csc_by_tag to edge * fix test ffailure in tensorflow * refine sorting by edges * fix docstring * remove unnecessary mem Co-authored-by:Xin Yao <xiny@nvidia.com>
-
- 29 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Update nccl communicator for when NCCL is missing * Use static_cast * Add doc string * Fix whitespace * Resrtict unit test to GPU runs Co-authored-by:Xin Yao <xiny@nvidia.com>
-
- 27 Jun, 2022 2 commits
-
-
ndickson-nvidia authored
* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU` * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half` * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM * * Added missing instantiation of DLDataTypeTraits<__half>::dtype * * Fixed linter error * Added clearer comment explaining why the cast to long long is necessary * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side * * Fixed linter formatting errors * * Changes to comments as recommended * * Made recommended changes to logging errors in FP16 specializations * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
-
Rhett Ying authored
* [BugFix] fix rpc-related build issue on mac OS * add warning message * add warning message
-
- 24 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Add uva by default to embedding * More updates * Update optimizer * Add new uva functions * Expose new pinned memory function * Add unit tests * Update formatting * Fix unit test * Handle auto UVA case when training is on CPU * Allow per-embedding decisions for whether to use UVA * Address spares_optim.py comments * Remove unused templates * Update unit test * Use dgl allocate memory for pinning * allow automatically unpin * workaround for d2h copy with a different dtype * fix linting * update error message * update copyright Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 23 Jun, 2022 2 commits
-
-
Xin Yao authored
* Explicitly unpin tensoradapter allocated arrays * Undo unrelated change * Add unit test * update unit test * add pinned_by_dgl flag to NDArray::Container * use dgl.ndarray for holding the pinning status * update multi-gpu uva inference * reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor * update unpin column and examples * add unit test for unpin column Co-authored-by:
Dominique LaSalle <dlasalle@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
-
Triston authored
* Fix a cub compile error for CUDA 11.5 * Fix comparison of integer expressions of different signedness in coo_sort.cu file * Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file * Remove never referenced variable in spmm.cu * Fix comparison of integer expressions of different signedness in rowwise_pick.h file * Fix comparison of integer expressions of different signedness in choice.cc file * Remove never referenced variable col_data in spat_op_impl_coo.cc * Remove never referenced variable allowed in global_uniform.cc * Fix comparison of integer expressions of different signedness in graph.cc * Fix comparison of integer expressions of different signedness in graph_apis.cc * Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build * Fix comparison of integer expressions of different signedness in libra_partition.cc * Fix comparison of integer expressions of different signedness in graph_op.cc Co-authored-by:
Triston Cao <tristonc@nvidia.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 20 Jun, 2022 1 commit
-
-
Rhett Ying authored
-
- 14 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Disable non-atomic atomic operations * Improve error message * Make error message more friendly
-
- 11 Jun, 2022 1 commit
-
-
Xin Yao authored
* Wrap all CUDA runtime API/CUB calls with macro * remove the usage of explicit cudaMalloc in favor of AllocWorkspace * fix typo Co-authored-by:Israt Nisa <neesha295@gmail.com>
-
- 08 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [ist] enable time out when fetching msg * fix lint error * minor refinements * improve minor log * fix dist test * fix timeout issue in tensorpipe
-
- 07 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988 * * Added USE_FP16 guard * * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm * * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended Co-authored-by:
Xin Yao <xiny@nvidia.com>
-
- 06 Jun, 2022 3 commits
-
-
ndickson-nvidia authored
* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures * Fixed an issue with previous check for FP16 support * * Removing FP16 type checks, since they should no longer be needed * * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures. Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs. * * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new
-
Quan (Andy) Gan authored
Co-authored-by:Xin Yao <xiny@nvidia.com>
-
Xin Yao authored
Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com>
-
- 28 May, 2022 3 commits
-
-
Quan (Andy) Gan authored
* change warning message * Update tensordispatch.cc
-
Quan (Andy) Gan authored
This reverts commit fdd1fe19.
-
Quan (Andy) Gan authored
-
- 26 May, 2022 1 commit
-
-
nv-dlasalle authored
* Enable FP16 for GPU builds in CI * Limit default GPU archs to pascal and above * Disable FP16 dispatching for cuda architectures less than 60 * Fix linting * Fix typos
-
- 25 May, 2022 1 commit
-
-
Minjie Wang authored
* cython nogil * move APIs to internal and add unit test * fix lint * disable callback array test
-
- 17 May, 2022 1 commit
-
-
paoxiaode authored
* Change the curand_init parameter * Change the curand_init parameter * commit * commit * change the curandState and launch dim of CSRRowwiseSample kernel * commit * keep _CSRRowWiseSampleReplaceKernel in sync Co-authored-by:nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
-
- 16 May, 2022 2 commits
-
-
nv-dlasalle authored
* Explicitly unpin tensoradapter allocated arrays * Undo unrelated change * Add unit test * update unit test
-
Xin Yao authored
* remove unnecessary induced vertices in EdgeSubgraph * add unit test
-
- 12 May, 2022 1 commit
-
-
nv-dlasalle authored
-
- 11 May, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] Enable maximum try times for socket backend via DGL_DIST_MAX_TRY_TIMES * reset env before/after test * print log for info when trying to connect * fix * print log in python instead of cpp
-
- 27 Apr, 2022 1 commit
-
-
Rhett Ying authored
* [Feature] enable socket net_type for rpc * fix lint * fix lint * fix build issue on windows * fix test failure on windows * fix test failure * fix cpp unit test failure * net_type blocking max_try_times * fix other comments * fix lint * fix comment * fix lint * fix cpp
-
- 26 Apr, 2022 1 commit
-
-
ayasar70 authored
* Based on issue #3436. Improving _SegmentCopyKernel s GPU utilization by switching to nonzero based thread assignment * fixing lint issues * Update cub for cuda 11.5 compatibility (#3468) * fixing type mismatch * tx guaranteed to be smaller than nnz. Hence removing last check * minor: updating comment * adding three unit tests for csr slice method to cover some corner cases * timing repeatkernel * clean * clean * clean * updating _SegmentMaskColKernel * Working on requests: removing sorted array check and adding comments to utility functions * fixing lint issue * Optimizing disjoint union kernel * Trying to resolve compilation issue on CI * [EMPTY] Relevant commit message here * applying revision requests on cpu/disjoint_union.cc * removing unnecessary casts * remove extra space Co-authored-by:
Abdurrahman Yasar <ayasar@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-