1. 08 Jul, 2025 1 commit
  2. 07 Jul, 2025 1 commit
  3. 04 Jul, 2025 1 commit
    • Shangyan Zhou's avatar
      Use TMA to optimize internode dispatch. (#276) · a2fa3b73
      Shangyan Zhou authored
      
      
      * Add TMA buffer allocation
      
      * Use TMA for forwarders and NVL receivers
      
      * Use lane 31 to operate TMA.
      
      * Change rdma buffer layout.
      
      * Use TMA to transfer scales also.
      
      * Increase the NVL recv buffer size.
      
      * Disable early stopping.
      
      * Apply similar optimizations on receiver warps.
      
      * Prevent warp divergence.
      
      * Disable aggressive ptx by default.
      
      * Revert using TMA to transfer scales.
      
      * Format.
      
      * Change the layout of dispatch NVL buffer.
      
      * Move topk transformation to recv warps.
      
      * Use TMA to transfer all data in foward warps
      
      * Use TMA to store scales.
      
      * Code lint
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      a2fa3b73
  4. 02 Jul, 2025 4 commits
  5. 27 Jun, 2025 6 commits
  6. 26 Jun, 2025 1 commit
  7. 25 Jun, 2025 1 commit
  8. 24 Jun, 2025 3 commits
  9. 23 Jun, 2025 1 commit
  10. 20 Jun, 2025 1 commit
  11. 18 Jun, 2025 1 commit
  12. 16 Jun, 2025 4 commits
  13. 13 Jun, 2025 2 commits
  14. 12 Jun, 2025 1 commit
  15. 11 Jun, 2025 2 commits
  16. 10 Jun, 2025 1 commit
  17. 09 Jun, 2025 3 commits
  18. 06 Jun, 2025 1 commit
    • Chenggang Zhao's avatar
      Use TMA instead of LD/ST for intra-node normal kernels (#191) · c8dceba1
      Chenggang Zhao authored
      * Update CMake files
      
      * Use TMA instead of LD/ST for intranode dispatch
      
      * Use TMA instead of LD/ST for intranode combine
      
      * Adjust configs
      
      * Test default configs as well
      
      * More warps for combine
      
      * Add inter-thread fence
      
      * Enable more warps
      
      * Do not use TMA for senders
      
      * Update configs
      
      * Remove useless wait
      c8dceba1
  19. 03 Jun, 2025 1 commit
  20. 28 May, 2025 1 commit
  21. 23 May, 2025 2 commits
  22. 12 May, 2025 1 commit