1. 02 Jun, 2026 1 commit
  2. 29 May, 2026 2 commits
  3. 23 May, 2026 1 commit
  4. 13 May, 2026 1 commit
  5. 23 Mar, 2026 1 commit
  6. 23 Jan, 2026 1 commit
  7. 15 Jan, 2026 1 commit
  8. 11 Dec, 2025 1 commit
  9. 24 Oct, 2025 1 commit
  10. 17 Oct, 2025 1 commit
  11. 24 Sep, 2025 1 commit
  12. 10 Sep, 2025 2 commits
  13. 05 Aug, 2025 1 commit
  14. 31 Jul, 2025 1 commit
  15. 15 Jul, 2025 3 commits
  16. 12 Jul, 2025 1 commit
  17. 04 Jul, 2025 1 commit
    • Shangyan Zhou's avatar
      Use TMA to optimize internode dispatch. (#276) · a2fa3b73
      Shangyan Zhou authored
      
      
      * Add TMA buffer allocation
      
      * Use TMA for forwarders and NVL receivers
      
      * Use lane 31 to operate TMA.
      
      * Change rdma buffer layout.
      
      * Use TMA to transfer scales also.
      
      * Increase the NVL recv buffer size.
      
      * Disable early stopping.
      
      * Apply similar optimizations on receiver warps.
      
      * Prevent warp divergence.
      
      * Disable aggressive ptx by default.
      
      * Revert using TMA to transfer scales.
      
      * Format.
      
      * Change the layout of dispatch NVL buffer.
      
      * Move topk transformation to recv warps.
      
      * Use TMA to transfer all data in foward warps
      
      * Use TMA to store scales.
      
      * Code lint
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      a2fa3b73
  18. 27 Jun, 2025 1 commit
  19. 11 Jun, 2025 1 commit
    • Chenggang Zhao's avatar
      Support Ampere architecture (#204) · b8d90fb7
      Chenggang Zhao authored
      * Update README
      
      * Update `setup.py`
      
      * Fix headers
      
      * Add `DISABLE_NVSHMEM` for APIs
      
      * Fix launch
      
      * Fix TMA settings
      
      * Fix TMA usages
      
      * Fix dlink
      
      * Separate layout kernels
      
      * Update version
      
      * Add `is_sm90_compiled`
      
      * Fix tests
      
      * Add NVLink connection checks
      
      * Update README
      
      * Fix tests
      
      * Add some comments
      
      * Minor fix
      
      * Minor fix
      
      * Fix bugs
      b8d90fb7
  20. 19 May, 2025 1 commit
  21. 25 Feb, 2025 1 commit