- 11 Jun, 2021 1 commit
-
-
nv-dlasalle authored
* Split from NCCL PR * Fix type in comment * Expand documentation for sparse_all_to_all_push * Restore previous behavior in example * Re-work optimizer to use NCCL based on gradient location * Allow for running with embedding on CPU but using NCCL for gradient exchange * Optimize single partition case * Fix pylint errors * Add missing include * fix gradient indexing * Fix line continuation * Migrate 'first_step' * Skip tests without enough GPUs to run NCCL * Improve empty tensor handling for pytorch 1.5 * Fix indentation * Allow multiple NCCL communicator to coexist * Improve handling of empty message * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Keepy empty tensor dimensionaless * th.empty -> th.tensor * Preserve shape for empty non-zero dimension tensors * Use shared state, when embedding is shared * Add support for gathering an embedding * Fix typo * Fix more typos * Fix backend call * Use NodeDataLoader to take advantage of ddp * Update training script to share memory * Only squeeze last dimension * Better handle empty message * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example * Fix typo in comment * Add asserts * Improve documentation in example Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 20 May, 2021 1 commit
-
-
nv-dlasalle authored
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825) * Split NCCL wrapper from sparse optimizer and sparse embedding * Add more unit tests for single node nccl * Fix unit test for tf * Switch to device histogram * Fix histgram issues * Finish migration to histogram * Handle cases with zero send/recieve data * Start on partition object * Get compiling * Updates * Add unit tests * Switch to partition object * Fix linting issues * Rename partition file * Add python doc * Fix python assert and finish doxygen comments * Remove stubs for range based partition to satisfy pylint * Wrap unit test in GPU only * Wrap explicit cuda call in ifdef * Merge with partition.py * update docstrings * Cleanup partition_op * Add Workspace object * Switch to using workspace object * Move last remainder based function out of nccl_api * Add error messages * Update docs with examples * Fix linting erros Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-