• nv-dlasalle's avatar
    [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings... · ae8dbe6d
    nv-dlasalle authored
    
    [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)
    
    * Split NCCL wrapper from sparse optimizer and sparse embedding
    
    * Add more unit tests for single node nccl
    
    * Fix unit test for tf
    
    * Switch to device histogram
    
    * Fix histgram issues
    
    * Finish migration to histogram
    
    * Handle cases with zero send/recieve data
    
    * Start on partition object
    
    * Get compiling
    
    * Updates
    
    * Add unit tests
    
    * Switch to partition object
    
    * Fix linting issues
    
    * Rename partition file
    
    * Add python doc
    
    * Fix python assert and finish doxygen comments
    
    * Remove stubs for range based partition to satisfy pylint
    
    * Wrap unit test in GPU only
    
    * Wrap explicit cuda call in ifdef
    
    * Merge with partition.py
    
    * update docstrings
    
    * Cleanup partition_op
    
    * Add Workspace object
    
    * Switch to using workspace object
    
    * Move last remainder based function out of nccl_api
    
    * Add error messages
    
    * Update docs with examples
    
    * Fix linting erros
    Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
    ae8dbe6d
test_partition.cc 1.71 KB