1. 07 Jul, 2025 1 commit
  2. 04 Jul, 2025 2 commits
    • Shangyan Zhou's avatar
      Update some dispatch configs. · e6d61fc6
      Shangyan Zhou authored
      e6d61fc6
    • Shangyan Zhou's avatar
      Use TMA to optimize internode dispatch. (#276) · a2fa3b73
      Shangyan Zhou authored
      
      
      * Add TMA buffer allocation
      
      * Use TMA for forwarders and NVL receivers
      
      * Use lane 31 to operate TMA.
      
      * Change rdma buffer layout.
      
      * Use TMA to transfer scales also.
      
      * Increase the NVL recv buffer size.
      
      * Disable early stopping.
      
      * Apply similar optimizations on receiver warps.
      
      * Prevent warp divergence.
      
      * Disable aggressive ptx by default.
      
      * Revert using TMA to transfer scales.
      
      * Format.
      
      * Change the layout of dispatch NVL buffer.
      
      * Move topk transformation to recv warps.
      
      * Use TMA to transfer all data in foward warps
      
      * Use TMA to store scales.
      
      * Code lint
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      a2fa3b73
  3. 02 Jul, 2025 14 commits
  4. 30 Jun, 2025 1 commit
  5. 27 Jun, 2025 7 commits
  6. 26 Jun, 2025 1 commit
  7. 25 Jun, 2025 2 commits
  8. 24 Jun, 2025 3 commits
  9. 23 Jun, 2025 2 commits
  10. 20 Jun, 2025 1 commit
  11. 18 Jun, 2025 6 commits