- 15 Sep, 2025 2 commits
-
-
Yizhi Wang authored
-
Chenggang Zhao authored
-
- 11 Sep, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 10 Sep, 2025 6 commits
-
-
Shangyan Zhou authored
* Suppress kineto output * Add pressure test mode * Add `x_pure_rand_e4m3` test * Add more results into hash value
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
* Fix mbarrier * Remove redundant store
-
- 09 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 01 Sep, 2025 3 commits
-
-
Chongchong Tian authored
Each thread is responsible for one target rank
-
fzyzcjy authored
* nits * hack unrolled warp copy * Revert "nits" This reverts commit 3e1b28d9b17f2c1cc46403d432ca576dbf15bd45.
-
fzyzcjy authored
-
- 28 Aug, 2025 1 commit
-
-
sky authored
* Fix: avoid floating point exception. Signed-off-by:
wangfakang <fakangwang@gmail.com> * simplify the code. Signed-off-by:
wangfakang <fakangwang@gmail.com> --------- Signed-off-by:
wangfakang <fakangwang@gmail.com>
-
- 26 Aug, 2025 1 commit
-
-
Zhiyi Hu authored
* fix combine timeout due to forwarder min head update * Update head before and after combine_token; add assertion for nvl_buffer_size_per_rdma_rank --------- Co-authored-by:zhiyi Hu <zhiyihu@U-NYQQMGK0-2250.local>
-
- 25 Aug, 2025 3 commits
-
-
fzyzcjy authored
-
Thunderbrook authored
-
sky authored
Signed-off-by:wangfakang <fakangwang@gmail.com>
-
- 19 Aug, 2025 1 commit
-
-
Tailing Yuan authored
Signed-off-by:Tailing Yuan <yuantailing@gmail.com>
-
- 14 Aug, 2025 1 commit
-
-
Yizhi Wang authored
-
- 11 Aug, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 08 Aug, 2025 2 commits
-
-
AlphaBaby authored
Co-authored-by:fujianhao.fjh <fujianhao.fjh@antgroup.com>
-
Shangyan Zhou authored
-
- 07 Aug, 2025 3 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Zhean Xu authored
* independent logfmt_simulate function * draft: logfmt low latency combine * Minor bug fixes * Fix non-logfmt bugs * Fix logfmt bugs * Fix logfmt bugs * Minor fix * Minor fix * Clean code * Clean code * Use fewer regs * Use two warp groups * Correct shared memory size * Minor fix * Minor fix * More rigorous tests * Clean code * Use more SMs * Use different unroll factor for send & recv * Update csrc/kernels/internode_ll.cu Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> * Update csrc/kernels/internode_ll.cu Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> * Some renaming * Some comments of tests * Format `logfmt_encode` * More lints * Some refactors on sends * Fix testing * Fix bugs * Renaming * Use the full warp * Unify combine recv * Lint * Lint * Support 2560 * Fix meta buffer dtype * Better encode calls * Better amin/max writes * Extra sync * Read `topk_idx` by once * Better specialization * Read weights by once * Rename * Bug fixed * Some renaming * Fix local memory usage for sending * Fix local memory usage for receiving * Less writes * Optimize performance * Optimize performance * Better performance * Optimize performance * Fix rounding * Manually unroll * Fix bench --------- Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Chenggang Zhao <chenggangz@deepseek.com>
-
- 05 Aug, 2025 1 commit
-
-
windreamer authored
-
- 01 Aug, 2025 1 commit
-
-
fzyzcjy authored
-
- 31 Jul, 2025 4 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
-
- 30 Jul, 2025 1 commit
-
-
sky authored
* Add diagnosis module for precise identification of slow ranks Signed-off-by:
wangfakang <fakangwang@gmail.com> * Add tests for the slow diagnosis module Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update some comments for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update test case for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Strip the diagnose module, thx LyricZhao and sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * update variable name and cumulative wait recv cost, thx sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * remove invalid comments. Signed-off-by:
wangfakang <fakangwang@gmail.com> --------- Signed-off-by:
wangfakang <fakangwang@gmail.com>
-
- 29 Jul, 2025 2 commits
-
-
Jee Jee Li authored
* Done Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> * Add comment Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> --------- Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com>
-
Void authored
Fix the address of dispatch_rdma_recv_count_buffer to avoid cleaning after each change in hidden_size/token_num. (#313) Signed-off-by:Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
-
- 22 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 21 Jul, 2025 3 commits
-
-
Zhiyi Hu authored
Co-authored-by:zhiyi Hu <zhiyihu@U-NYQQMGK0-2250.local>
-
Guangguan Wang authored
* Add arg --pressure-test for test_low_latency.py Add arg --pressure-test for test_low_latency.py Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com> * Export NVSHMEM_QP_DEPTH Export NVSHMEM_QP_DEPTH Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com> --------- Signed-off-by:
Guangguan Wang <guangguan.wang@linux.alibaba.com>
-
Shangyan Zhou authored
* Remove an outdated todo * Increase the number of combine forward warps. * forwarder use TMA. * Small fix * Code lint --------- Co-authored-by:Chenggang Zhao <chenggangz@deepseek.com>
-
- 18 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
Fix for data error and kernel hung because of inflight rdma channel head update
-