1. 29 Jul, 2025 1 commit
  2. 22 Jul, 2025 1 commit
  3. 21 Jul, 2025 3 commits
  4. 18 Jul, 2025 1 commit
  5. 17 Jul, 2025 1 commit
    • Guangguan's avatar
      Fix for data error and kernel hung because of inflight rdma channel head update · b65b22ed
      Guangguan authored
      
      
      When dispatch/combine, neither sender nor receiver waits
      for the finish of the rdma channel head update, which may
      result in the remaining inflight head update wqes even after
      the kernel finished. Once the infight wqes arrive after the
      rdma channel head buffer cleaning for the next round of
      dispatch/combine, the rdma channel head buffer will be re-
      written to a none-zero value. The rdma sender can reuse the
      data buffer before the rdma receivers consume the date buffer
      because of the wrong rdma channel head, cauing date error and
      kernel hung.
      For performance considering, to overlap the inflight wqes' RTT,
      fix this issue by waiting for all previous inflight wqes to
      complete before cleaning rdma buffers in the next round of
      dispatch/combine.
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      b65b22ed
  6. 16 Jul, 2025 3 commits
  7. 15 Jul, 2025 8 commits
  8. 14 Jul, 2025 4 commits
  9. 12 Jul, 2025 3 commits
  10. 11 Jul, 2025 4 commits
  11. 10 Jul, 2025 2 commits
    • Shangyan Zhou's avatar
      Use TMA to optimize internode combine. (#287) · 06f417dc
      Shangyan Zhou authored
      
      
      * Let forwarders use a dedicated SM
      
      * Shuffle rdma idx
      
      * Sender use TMA.
      
      * Adjust the tuning chunk size.
      
      * Modify NVL chunk layout.
      
      * Update some combine config.
      
      * Small lint
      
      * Minor fix
      
      * Overlap TMA store
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      06f417dc
    • Chenggang Zhao's avatar
      Support 10-bit LogFMT (simulated version) (#284) · 1cf85fb2
      Chenggang Zhao authored
      
      
      * Add LogFMT interface
      
      * Update comments
      
      * Add simulated code
      
      * Fix comments
      
      * Change to 128 channels
      
      * Add notes
      
      * Optimize performance
      
      * optimize simulate logfmt 10bit
      
      * Minor fix
      
      * Stronger low latency tests
      
      * Minor fix
      
      * Stronger low latency tests for logfmt
      
      * Optimize logfmt simulate: lg2/ex2 ptx, step_inv
      
      * Minor fix
      
      * Minor fix
      
      * Add non-logfmt bench
      
      * Fix value=0 for logfmt
      
      * Optimize performance
      
      * Refactor tests
      
      ---------
      Co-authored-by: default avatarZhean Xu <xza@deepseek.com>
      1cf85fb2
  12. 09 Jul, 2025 1 commit
  13. 08 Jul, 2025 1 commit
  14. 07 Jul, 2025 1 commit
  15. 04 Jul, 2025 2 commits
    • Shangyan Zhou's avatar
      Update some dispatch configs. · e6d61fc6
      Shangyan Zhou authored
      e6d61fc6
    • Shangyan Zhou's avatar
      Use TMA to optimize internode dispatch. (#276) · a2fa3b73
      Shangyan Zhou authored
      
      
      * Add TMA buffer allocation
      
      * Use TMA for forwarders and NVL receivers
      
      * Use lane 31 to operate TMA.
      
      * Change rdma buffer layout.
      
      * Use TMA to transfer scales also.
      
      * Increase the NVL recv buffer size.
      
      * Disable early stopping.
      
      * Apply similar optimizations on receiver warps.
      
      * Prevent warp divergence.
      
      * Disable aggressive ptx by default.
      
      * Revert using TMA to transfer scales.
      
      * Format.
      
      * Change the layout of dispatch NVL buffer.
      
      * Move topk transformation to recv warps.
      
      * Use TMA to transfer all data in foward warps
      
      * Use TMA to store scales.
      
      * Code lint
      
      ---------
      Co-authored-by: default avatarChenggang Zhao <chenggangz@deepseek.com>
      a2fa3b73
  16. 02 Jul, 2025 4 commits