You need to sign in or sign up before continuing.
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings...
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)
* Split NCCL wrapper from sparse optimizer and sparse embedding
* Add more unit tests for single node nccl
* Fix unit test for tf
* Switch to device histogram
* Fix histgram issues
* Finish migration to histogram
* Handle cases with zero send/recieve data
* Start on partition object
* Get compiling
* Updates
* Add unit tests
* Switch to partition object
* Fix linting issues
* Rename partition file
* Add python doc
* Fix python assert and finish doxygen comments
* Remove stubs for range based partition to satisfy pylint
* Wrap unit test in GPU only
* Wrap explicit cuda call in ifdef
* Merge with partition.py
* update docstrings
* Cleanup partition_op
* Add Workspace object
* Switch to using workspace object
* Move last remainder based function out of nccl_api
* Add error messages
* Update docs with examples
* Fix linting erros
Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
Showing
cmake/util/FindNccl.cmake
0 → 100644
python/dgl/cuda/__init__.py
0 → 100644
python/dgl/cuda/nccl.py
0 → 100644
src/partition/partition_op.h
0 → 100644
src/runtime/cuda/nccl_api.cu
0 → 100644
This diff is collapsed.
src/runtime/cuda/nccl_api.h
0 → 100644
src/runtime/workspace.h
0 → 100644
tests/compute/test_nccl.py
0 → 100644
Please register or sign in to comment