- 23 Dec, 2025 1 commit
-
-
lishen authored
-
- 15 Dec, 2025 1 commit
-
-
lishen authored
-
- 03 Dec, 2025 1 commit
-
-
lishen authored
-
- 25 Nov, 2025 1 commit
-
-
lishen authored
-
- 05 Nov, 2025 1 commit
-
-
lishen authored
-
- 03 Nov, 2025 1 commit
-
-
lishen authored
-
- 30 Oct, 2025 1 commit
-
-
lishen authored
-
- 24 Oct, 2025 1 commit
-
-
lijian6 authored
2. 修改测试脚本,降低显存占用。使用量从17G -> 8G. Signed-off-by:lijian <lijian6@sugon.com>
-
- 17 Oct, 2025 1 commit
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
- 24 Sep, 2025 1 commit
-
-
Tailing Yuan authored
Co-authored-by:Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
-
- 22 Sep, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 15 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 10 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 25 Aug, 2025 1 commit
-
-
sky authored
Signed-off-by:wangfakang <fakangwang@gmail.com>
-
- 19 Aug, 2025 1 commit
-
-
Tailing Yuan authored
Signed-off-by:Tailing Yuan <yuantailing@gmail.com>
-
- 30 Jul, 2025 1 commit
-
-
sky authored
* Add diagnosis module for precise identification of slow ranks Signed-off-by:
wangfakang <fakangwang@gmail.com> * Add tests for the slow diagnosis module Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update some comments for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update test case for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Strip the diagnose module, thx LyricZhao and sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * update variable name and cumulative wait recv cost, thx sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * remove invalid comments. Signed-off-by:
wangfakang <fakangwang@gmail.com> --------- Signed-off-by:
wangfakang <fakangwang@gmail.com>
-
- 22 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 21 Jul, 2025 1 commit
-
-
Guangguan Wang authored
* Add arg --pressure-test for test_low_latency.py Add arg --pressure-test for test_low_latency.py Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com> * Export NVSHMEM_QP_DEPTH Export NVSHMEM_QP_DEPTH Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com> --------- Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com>
-
- 15 Jul, 2025 1 commit
-
-
Seth Howell authored
This enables the CPU-Assisted data path. Signed-off-by:Seth Howell <sethh@nvidia.com>
-
- 11 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
* Explicitly destroy the C++ runtime and release resources. * Small fix * fix typo * Add a flag to control whether explicit `destroy()` is required.
-
- 10 Jul, 2025 2 commits
-
-
Shangyan Zhou authored
* Let forwarders use a dedicated SM * Shuffle rdma idx * Sender use TMA. * Adjust the tuning chunk size. * Modify NVL chunk layout. * Update some combine config. * Small lint * Minor fix * Overlap TMA store --------- Co-authored-by:Chenggang Zhao <chenggangz@deepseek.com>
-
Chenggang Zhao authored
* Add LogFMT interface * Update comments * Add simulated code * Fix comments * Change to 128 channels * Add notes * Optimize performance * optimize simulate logfmt 10bit * Minor fix * Stronger low latency tests * Minor fix * Stronger low latency tests for logfmt * Optimize logfmt simulate: lg2/ex2 ptx, step_inv * Minor fix * Minor fix * Add non-logfmt bench * Fix value=0 for logfmt * Optimize performance * Refactor tests --------- Co-authored-by:Zhean Xu <xza@deepseek.com>
-
- 04 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 27 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 25 Jun, 2025 2 commits
-
-
Shangyan Zhou authored
* Support bias. * Fix. * Fix style.
-
Shangyan Zhou authored
* Add `get_comm_stream`. * Fix style.
-
- 16 Jun, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 12 Jun, 2025 1 commit
-
-
Shifang Xu authored
-
- 11 Jun, 2025 3 commits
-
-
Chenggang Zhao authored
* Update README * Update `setup.py` * Fix headers * Add `DISABLE_NVSHMEM` for APIs * Fix launch * Fix TMA settings * Fix TMA usages * Fix dlink * Separate layout kernels * Update version * Add `is_sm90_compiled` * Fix tests * Add NVLink connection checks * Update README * Fix tests * Add some comments * Minor fix * Minor fix * Fix bugs
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
- 09 Jun, 2025 2 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add low-latency kernel usage flag * Update comments
-
- 07 Jun, 2025 1 commit
-
-
fzyzcjy authored
-
- 06 Jun, 2025 2 commits
-
-
Chenggang Zhao authored
* Update CMake files * Use TMA instead of LD/ST for intranode dispatch * Use TMA instead of LD/ST for intranode combine * Adjust configs * Test default configs as well * More warps for combine * Add inter-thread fence * Enable more warps * Do not use TMA for senders * Update configs * Remove useless wait
-
Shangyan Zhou authored
Co-authored-by:Shangyan Zhou <sy.zhou@deepseek.com>
-
- 23 May, 2025 3 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
cywork121 authored
-
- 22 Apr, 2025 1 commit
-
-
Chenggang Zhao authored
-