- 12 May, 2022 1 commit
-
-
nv-dlasalle authored
-
- 18 Oct, 2021 1 commit
-
-
nv-dlasalle authored
-
- 15 Oct, 2021 1 commit
-
-
David Min authored
* Add pytorch-direct version * remove * add documentation for UnifiedTensor * Revert "add documentation for UnifiedTensor" This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849. * add boundary check for UVM IndexSelect * relocate boundary check index kernels to cuda * fix function name * fix indexkernel in nccl api * fix argument ordering * simplify code * Add a comment for the uvm version Co-authored-by:
shhssdm <shhssdm@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 06 Sep, 2021 1 commit
-
-
Jinjing Zhou authored
* remove * remove * fix * remove * remove
-
- 27 Jun, 2021 1 commit
-
-
Jinjing Zhou authored
* fix * remove nvidiasmi * fix * fix docs * fix * fix * 1 * fix * remove * skip deprecated kernel * fix * Revert "skip deprecated kernel" This reverts commit c5ceb7f60dbbaf065b81cc3680757fd611d90ad3. * fix
-
- 23 Jun, 2021 1 commit
-
-
nv-dlasalle authored
Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 11 Jun, 2021 1 commit
-
-
nv-dlasalle authored
* Split from NCCL PR * Fix type in comment * Expand documentation for sparse_all_to_all_push * Restore previous behavior in example * Re-work optimizer to use NCCL based on gradient location * Allow for running with embedding on CPU but using NCCL for gradient exchange * Optimize single partition case * Fix pylint errors * Add missing include * fix gradient indexing * Fix line continuation * Migrate 'first_step' * Skip tests without enough GPUs to run NCCL * Improve empty tensor handling for pytorch 1.5 * Fix indentation * Allow multiple NCCL communicator to coexist * Improve handling of empty message * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Keepy empty tensor dimensionaless * th.empty -> th.tensor * Preserve shape for empty non-zero dimension tensors * Use shared state, when embedding is shared * Add support for gathering an embedding * Fix typo * Fix more typos * Fix backend call * Use NodeDataLoader to take advantage of ddp * Update training script to share memory * Only squeeze last dimension * Better handle empty message * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example * Fix typo in comment * Add asserts * Improve documentation in example Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 20 May, 2021 1 commit
-
-
nv-dlasalle authored
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825) * Split NCCL wrapper from sparse optimizer and sparse embedding * Add more unit tests for single node nccl * Fix unit test for tf * Switch to device histogram * Fix histgram issues * Finish migration to histogram * Handle cases with zero send/recieve data * Start on partition object * Get compiling * Updates * Add unit tests * Switch to partition object * Fix linting issues * Rename partition file * Add python doc * Fix python assert and finish doxygen comments * Remove stubs for range based partition to satisfy pylint * Wrap unit test in GPU only * Wrap explicit cuda call in ifdef * Merge with partition.py * update docstrings * Cleanup partition_op * Add Workspace object * Switch to using workspace object * Move last remainder based function out of nccl_api * Add error messages * Update docs with examples * Fix linting erros Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-