1. 04 Nov, 2022 1 commit
  2. 29 Jun, 2022 1 commit
  3. 20 May, 2021 1 commit
    • nv-dlasalle's avatar
      [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings... · ae8dbe6d
      nv-dlasalle authored
      
      [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)
      
      * Split NCCL wrapper from sparse optimizer and sparse embedding
      
      * Add more unit tests for single node nccl
      
      * Fix unit test for tf
      
      * Switch to device histogram
      
      * Fix histgram issues
      
      * Finish migration to histogram
      
      * Handle cases with zero send/recieve data
      
      * Start on partition object
      
      * Get compiling
      
      * Updates
      
      * Add unit tests
      
      * Switch to partition object
      
      * Fix linting issues
      
      * Rename partition file
      
      * Add python doc
      
      * Fix python assert and finish doxygen comments
      
      * Remove stubs for range based partition to satisfy pylint
      
      * Wrap unit test in GPU only
      
      * Wrap explicit cuda call in ifdef
      
      * Merge with partition.py
      
      * update docstrings
      
      * Cleanup partition_op
      
      * Add Workspace object
      
      * Switch to using workspace object
      
      * Move last remainder based function out of nccl_api
      
      * Add error messages
      
      * Update docs with examples
      
      * Fix linting erros
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      ae8dbe6d