1. 29 May, 2026 1 commit
  2. 12 May, 2026 1 commit
  3. 17 Apr, 2026 1 commit
  4. 23 Mar, 2026 1 commit
  5. 09 Feb, 2026 1 commit
  6. 04 Feb, 2026 1 commit
  7. 02 Feb, 2026 1 commit
  8. 15 Jan, 2026 1 commit
  9. 23 Dec, 2025 1 commit
  10. 15 Dec, 2025 1 commit
  11. 03 Dec, 2025 1 commit
  12. 25 Nov, 2025 1 commit
  13. 05 Nov, 2025 1 commit
  14. 03 Nov, 2025 1 commit
  15. 30 Oct, 2025 1 commit
  16. 24 Oct, 2025 1 commit
  17. 17 Oct, 2025 1 commit
  18. 24 Sep, 2025 1 commit
  19. 22 Sep, 2025 1 commit
  20. 15 Sep, 2025 1 commit
  21. 10 Sep, 2025 1 commit
  22. 25 Aug, 2025 1 commit
  23. 19 Aug, 2025 1 commit
  24. 30 Jul, 2025 1 commit
  25. 22 Jul, 2025 1 commit
  26. 21 Jul, 2025 1 commit
  27. 15 Jul, 2025 1 commit
  28. 11 Jul, 2025 1 commit
  29. 10 Jul, 2025 2 commits
    • Shangyan Zhou's avatar
      Use TMA to optimize internode combine. (#287) · 06f417dc
      Shangyan Zhou authored
      
      
      * Let forwarders use a dedicated SM
      
      * Shuffle rdma idx
      
      * Sender use TMA.
      
      * Adjust the tuning chunk size.
      
      * Modify NVL chunk layout.
      
      * Update some combine config.
      
      * Small lint
      
      * Minor fix
      
      * Overlap TMA store
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      06f417dc
    • Chenggang Zhao's avatar
      Support 10-bit LogFMT (simulated version) (#284) · 1cf85fb2
      Chenggang Zhao authored
      
      
      * Add LogFMT interface
      
      * Update comments
      
      * Add simulated code
      
      * Fix comments
      
      * Change to 128 channels
      
      * Add notes
      
      * Optimize performance
      
      * optimize simulate logfmt 10bit
      
      * Minor fix
      
      * Stronger low latency tests
      
      * Minor fix
      
      * Stronger low latency tests for logfmt
      
      * Optimize logfmt simulate: lg2/ex2 ptx, step_inv
      
      * Minor fix
      
      * Minor fix
      
      * Add non-logfmt bench
      
      * Fix value=0 for logfmt
      
      * Optimize performance
      
      * Refactor tests
      
      ---------
      Co-authored-by: default avatarZhean Xu <xza@deepseek.com>
      1cf85fb2
  30. 04 Jul, 2025 1 commit
  31. 27 Jun, 2025 1 commit
  32. 25 Jun, 2025 2 commits
  33. 16 Jun, 2025 1 commit
  34. 12 Jun, 2025 1 commit
  35. 11 Jun, 2025 3 commits
  36. 09 Jun, 2025 1 commit