- 16 Jul, 2025 3 commits
-
-
Shangyan Zhou authored
-
Shangyan Zhou authored
third-party: Improvements to NVSHMEM Integration
-
Shangyan Zhou authored
* Fix rdma head movement * Optimize `cached_notify` by using TMA. * Fix * Small fix
-
- 15 Jul, 2025 8 commits
-
-
Seth Howell authored
Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Shangyan Zhou authored
correct the wqe_idx in rdma write wqe
-
Guangguan authored
correct the wqe_idx in rdma write wqe when num_wqes > 1 in nvshmemi_ibgda_put_nbi_warp. Signed-off-by:Guangguan <guangguan.wang@linux.alibaba.com>
-
Seth Howell authored
This enables the CPU-Assisted data path. Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
Responding to review comments. Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
This will give consumers an opportunity to update their builds. Signed-off-by:Seth Howell <sethh@nvidia.com>
-
- 14 Jul, 2025 4 commits
-
-
Shangyan Zhou authored
-
Shangyan Zhou authored
* Strengthen the barrier in `cached_notify` * lint * Change the timing method * lint
-
Chenggang Zhao authored
-
Zhean Xu authored
* feat: low latency combine inplace TMA * optimize tma pointer with PatternVisitor * Minor cleanup * Add `elect_one_sync` --------- Co-authored-by:
Zhean Xu <xza@deepseek.com> Co-authored-by:
Chenggang Zhao <chenggangz@deepseek.com>
-
- 12 Jul, 2025 3 commits
-
-
Seth Howell authored
Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
This allows users to use NVSHMEM without setting the driver regkey. Signed-off-by:Seth Howell <sethh@nvidia.com>
-
Seth Howell authored
NVSHMEM 3.3 and above support the host-side features in the patch. Note: Removed recv queue support Signed-off-by:Seth Howell <sethh@nvidia.com>
-
- 11 Jul, 2025 4 commits
-
-
Shangyan Zhou authored
ibgda: support non-bond dual-port environments via multi-port config
-
root authored
fix format
-
Shangyan Zhou authored
-
Shangyan Zhou authored
* Explicitly destroy the C++ runtime and release resources. * Small fix * fix typo * Add a flag to control whether explicit `destroy()` is required.
-
- 10 Jul, 2025 2 commits
-
-
Shangyan Zhou authored
* Let forwarders use a dedicated SM * Shuffle rdma idx * Sender use TMA. * Adjust the tuning chunk size. * Modify NVL chunk layout. * Update some combine config. * Small lint * Minor fix * Overlap TMA store --------- Co-authored-by:Chenggang Zhao <chenggangz@deepseek.com>
-
Chenggang Zhao authored
* Add LogFMT interface * Update comments * Add simulated code * Fix comments * Change to 128 channels * Add notes * Optimize performance * optimize simulate logfmt 10bit * Minor fix * Stronger low latency tests * Minor fix * Stronger low latency tests for logfmt * Optimize logfmt simulate: lg2/ex2 ptx, step_inv * Minor fix * Minor fix * Add non-logfmt bench * Fix value=0 for logfmt * Optimize performance * Refactor tests --------- Co-authored-by:Zhean Xu <xza@deepseek.com>
-
- 09 Jul, 2025 1 commit
-
-
liuhe authored
-
- 08 Jul, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 07 Jul, 2025 1 commit
-
-
Zhean Xu authored
Co-authored-by:Zhean Xu <xza@deepseek.com>
-
- 04 Jul, 2025 2 commits
-
-
Shangyan Zhou authored
-
Shangyan Zhou authored
* Add TMA buffer allocation * Use TMA for forwarders and NVL receivers * Use lane 31 to operate TMA. * Change rdma buffer layout. * Use TMA to transfer scales also. * Increase the NVL recv buffer size. * Disable early stopping. * Apply similar optimizations on receiver warps. * Prevent warp divergence. * Disable aggressive ptx by default. * Revert using TMA to transfer scales. * Format. * Change the layout of dispatch NVL buffer. * Move topk transformation to recv warps. * Use TMA to transfer all data in foward warps * Use TMA to store scales. * Code lint --------- Co-authored-by:Chenggang Zhao <chenggangz@deepseek.com>
-
- 02 Jul, 2025 11 commits
-
-
Chenggang Zhao authored
-
youkaichao authored
* use cli arg for num_processes Signed-off-by:
youkaichao <youkaichao@gmail.com> * update low-latency Signed-off-by:
youkaichao <youkaichao@gmail.com> * update intranode Signed-off-by:
youkaichao <youkaichao@gmail.com> * update internode Signed-off-by:
youkaichao <youkaichao@gmail.com> --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
fzyzcjy authored
* more * more * more * more * more * more
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
fzyzcjy authored
-
fzyzcjy authored
-