- 26 Feb, 2026 1 commit
-
-
lishen authored
-
- 05 Feb, 2026 1 commit
-
-
lishen authored
-
- 29 Jan, 2026 1 commit
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
- 17 Oct, 2025 1 commit
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
- 24 Sep, 2025 1 commit
-
-
Tailing Yuan authored
Co-authored-by:Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
-
- 16 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
* Remove redundant TMA flushes * Less barrier initialization overhead * Simplify `elect_one_sync` * Use `elect_one_sync` instead of lanes * Minor fix * Polish testing prints * Refactor for internode kernels * Better performance
-
- 11 Jul, 2025 1 commit
-
-
Shangyan Zhou authored
* Explicitly destroy the C++ runtime and release resources. * Small fix * fix typo * Add a flag to control whether explicit `destroy()` is required.
-
- 02 Jul, 2025 5 commits
-
-
Chenggang Zhao authored
-
youkaichao authored
* use cli arg for num_processes Signed-off-by:
youkaichao <youkaichao@gmail.com> * update low-latency Signed-off-by:
youkaichao <youkaichao@gmail.com> * update intranode Signed-off-by:
youkaichao <youkaichao@gmail.com> * update internode Signed-off-by:
youkaichao <youkaichao@gmail.com> --------- Signed-off-by:
youkaichao <youkaichao@gmail.com>
-
Chenggang Zhao authored
-
fzyzcjy authored
-
fzyzcjy authored
-
- 24 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
* Increase the test round. * Add warp synchronization. * Shuffle the send warps. * Add time elapsed into bench result.
-
- 12 Jun, 2025 1 commit
-
-
Shifang Xu authored
-
- 11 Jun, 2025 3 commits
-
-
Chenggang Zhao authored
* Update README * Update `setup.py` * Fix headers * Add `DISABLE_NVSHMEM` for APIs * Fix launch * Fix TMA settings * Fix TMA usages * Fix dlink * Separate layout kernels * Update version * Add `is_sm90_compiled` * Fix tests * Add NVLink connection checks * Update README * Fix tests * Add some comments * Minor fix * Minor fix * Fix bugs
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
- 06 Jun, 2025 1 commit
-
-
Chenggang Zhao authored
* Update CMake files * Use TMA instead of LD/ST for intranode dispatch * Use TMA instead of LD/ST for intranode combine * Adjust configs * Test default configs as well * More warps for combine * Add inter-thread fence * Enable more warps * Do not use TMA for senders * Update configs * Remove useless wait
-
- 11 Apr, 2025 1 commit
-
-
Hao Lin authored
Signed-off-by:Hao Lin <linhaomails@gmail.com>
-
- 10 Apr, 2025 1 commit
-
-
fujianhao.fjh authored
-
- 25 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 04 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 03 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 25 Feb, 2025 1 commit
-
-
Chenggang Zhao authored
-