- 11 Oct, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
* Auto fix c++. * reformat Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 21 Sep, 2022 1 commit
-
-
Xin Yao authored
* disable warning for tensorpipe * fix warning * enable lint check for cuh files * resolve comments
-
- 19 Sep, 2022 2 commits
-
-
nv-dlasalle authored
* updates * Enable caching C++ result * Add missing docstring * Remove unused function * Add unit test * Address comments
-
Xin Yao authored
* rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments
-
- 15 Sep, 2022 1 commit
-
-
Xin Yao authored
* add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle
-
- 06 Sep, 2022 1 commit
-
-
Chang Liu authored
* Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by:xiny <xiny@nvidia.com>
-
- 05 Sep, 2022 2 commits
-
-
peizhou001 authored
* enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by:Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
-
nv-dlasalle authored
* Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xin Yao <yaox12@outlook.com>
-
- 31 Aug, 2022 1 commit
-
-
Xin Yao authored
* Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments
-
- 23 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 18 Aug, 2022 1 commit
-
-
Daniil Sizov authored
* Add helper method for temporary affinitization of compute threads * Rework DL affinitization as single helper * Add example usage in benchmarks * Fix python linter warnings * Fix affinity helper params * Use NUMA node 0 cores only by default * Fix benchmarks * Fix lint errors Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 15 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 12 Aug, 2022 1 commit
-
-
Xin Yao authored
* Change CUDA_MAX_NUM_THREADS to 256 * change the configuration of grid
-
- 09 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 01 Aug, 2022 1 commit
-
-
Xin Yao authored
* enable use for weighted neighbor sampler and biased random walk * add unit tests * fix for mxnet/tf * fix typo
-
- 29 Jul, 2022 1 commit
-
-
Xin Yao authored
* add weighted sampling without replacement (A-Chao) * improve Algorithm A-Chao with block-wise prefix sum * correctly fill out_idxs * implement weighted sampling with replacement * small fix * merge host-side code of weighted/uniform sampling * enable unit tests for cuda weighted sampling * move thrust/cub wrapper to the cmake file * update docs accordingly * fix linting * fix linting * fix unit test * Bump external CUB/Thrust versions * Fix code style and update description of algorithm design * [Feature] GPU support weighted graph neighbor sampling commit by pengqirong(OPPO) * merge pengqirong's implementation * revert the change to cub and thrust * fix linting * use DeviceSegmentedSort for better performance * add more comments * add necessary notes * add necessary notes * resolve some comments * define THRUST_CUB_WRAPPED_NAMESPACE * fix doc Co-authored-by:彭齐荣 <657017034@qq.com>
-
- 27 Jul, 2022 1 commit
-
-
Rhett Ying authored
* [Log] fix confusing error log in TCPSocket::Bind() * fix lint
-
- 26 Jul, 2022 1 commit
-
-
Dewvin authored
* [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * [Feature] Add CUDA Weighted Randomwalk Sampling * fix empty prob array && enable non-uniform for restart && enable unit tests * update doc and guide for randomwalk and pinsage * update comments Co-authored-by:
zhenliangqiu <ubuntu@ip-172-31-24-245.ap-southeast-1.compute.internal> Co-authored-by:
xiny <xiny@nvidia.com>
-
- 15 Jul, 2022 1 commit
-
-
Quan (Andy) Gan authored
-
- 09 Jul, 2022 1 commit
-
-
Xin Yao authored
-
- 07 Jul, 2022 1 commit
-
-
Xin Yao authored
-
- 01 Jul, 2022 2 commits
-
-
Rhett Ying authored
-
Rhett Ying authored
* [Feature] extend sort_csr/csc_by_tag to edge * fix test ffailure in tensorflow * refine sorting by edges * fix docstring * remove unnecessary mem Co-authored-by:Xin Yao <xiny@nvidia.com>
-
- 29 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Update nccl communicator for when NCCL is missing * Use static_cast * Add doc string * Fix whitespace * Resrtict unit test to GPU runs Co-authored-by:Xin Yao <xiny@nvidia.com>
-
- 27 Jun, 2022 2 commits
-
-
ndickson-nvidia authored
* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU` * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half` * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM * * Added missing instantiation of DLDataTypeTraits<__half>::dtype * * Fixed linter error * Added clearer comment explaining why the cast to long long is necessary * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side * * Fixed linter formatting errors * * Changes to comments as recommended * * Made recommended changes to logging errors in FP16 specializations * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
-
Rhett Ying authored
* [BugFix] fix rpc-related build issue on mac OS * add warning message * add warning message
-
- 24 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Add uva by default to embedding * More updates * Update optimizer * Add new uva functions * Expose new pinned memory function * Add unit tests * Update formatting * Fix unit test * Handle auto UVA case when training is on CPU * Allow per-embedding decisions for whether to use UVA * Address spares_optim.py comments * Remove unused templates * Update unit test * Use dgl allocate memory for pinning * allow automatically unpin * workaround for d2h copy with a different dtype * fix linting * update error message * update copyright Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 23 Jun, 2022 2 commits
-
-
Xin Yao authored
* Explicitly unpin tensoradapter allocated arrays * Undo unrelated change * Add unit test * update unit test * add pinned_by_dgl flag to NDArray::Container * use dgl.ndarray for holding the pinning status * update multi-gpu uva inference * reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor * update unpin column and examples * add unit test for unpin column Co-authored-by:
Dominique LaSalle <dlasalle@nvidia.com> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
-
Triston authored
* Fix a cub compile error for CUDA 11.5 * Fix comparison of integer expressions of different signedness in coo_sort.cu file * Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file * Remove never referenced variable in spmm.cu * Fix comparison of integer expressions of different signedness in rowwise_pick.h file * Fix comparison of integer expressions of different signedness in choice.cc file * Remove never referenced variable col_data in spat_op_impl_coo.cc * Remove never referenced variable allowed in global_uniform.cc * Fix comparison of integer expressions of different signedness in graph.cc * Fix comparison of integer expressions of different signedness in graph_apis.cc * Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build * Fix comparison of integer expressions of different signedness in libra_partition.cc * Fix comparison of integer expressions of different signedness in graph_op.cc Co-authored-by:
Triston Cao <tristonc@nvidia.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 20 Jun, 2022 1 commit
-
-
Rhett Ying authored
-
- 14 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Disable non-atomic atomic operations * Improve error message * Make error message more friendly
-
- 11 Jun, 2022 1 commit
-
-
Xin Yao authored
* Wrap all CUDA runtime API/CUB calls with macro * remove the usage of explicit cudaMalloc in favor of AllocWorkspace * fix typo Co-authored-by:Israt Nisa <neesha295@gmail.com>
-
- 08 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [ist] enable time out when fetching msg * fix lint error * minor refinements * improve minor log * fix dist test * fix timeout issue in tensorpipe
-
- 07 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988 * * Added USE_FP16 guard * * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm * * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended Co-authored-by:
Xin Yao <xiny@nvidia.com>
-
- 06 Jun, 2022 3 commits
-
-
ndickson-nvidia authored
* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures * Fixed an issue with previous check for FP16 support * * Removing FP16 type checks, since they should no longer be needed * * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures. Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs. * * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new
-
Quan (Andy) Gan authored
Co-authored-by:Xin Yao <xiny@nvidia.com>
-
Xin Yao authored
Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com>
-
- 28 May, 2022 3 commits
-
-
Quan (Andy) Gan authored
* change warning message * Update tensordispatch.cc
-
Quan (Andy) Gan authored
This reverts commit fdd1fe19.
-
Quan (Andy) Gan authored
-