- 07 Nov, 2025 1 commit
-
-
lishen authored
-
- 06 Nov, 2025 1 commit
-
-
lijian6 authored
2. Add internode ll mode. 3. Add test internode ll mode. Signed-off-by:lijian <lijian6@sugon.com>
-
- 05 Nov, 2025 2 commits
- 03 Nov, 2025 1 commit
-
-
lishen authored
-
- 30 Oct, 2025 1 commit
-
-
lishen authored
-
- 24 Oct, 2025 1 commit
-
-
lijian6 authored
2. 修改测试脚本,降低显存占用。使用量从17G -> 8G. Signed-off-by:lijian <lijian6@sugon.com>
-
- 21 Oct, 2025 2 commits
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
lijian6 authored
backup pass way. Signed-off-by:lishen <lishen@sugon.com>
-
- 20 Oct, 2025 2 commits
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
- 17 Oct, 2025 1 commit
-
-
lijian6 authored
Signed-off-by:lijian <lijian6@sugon.com>
-
- 24 Sep, 2025 1 commit
-
-
Tailing Yuan authored
Co-authored-by:Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
-
- 22 Sep, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 17 Sep, 2025 1 commit
-
-
Shangyan Zhou authored
* Fix hidden_size % 128 != 0 * Add `align_down()` function * Use the full warp to wait TMA store * Support arbitrary hidden sizes in fp8 cast * lint
-
- 16 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
* Remove redundant TMA flushes * Less barrier initialization overhead * Simplify `elect_one_sync` * Use `elect_one_sync` instead of lanes * Minor fix * Polish testing prints * Refactor for internode kernels * Better performance
-
- 15 Sep, 2025 2 commits
-
-
Yizhi Wang authored
-
Chenggang Zhao authored
-
- 11 Sep, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 10 Sep, 2025 6 commits
-
-
Shangyan Zhou authored
* Suppress kineto output * Add pressure test mode * Add `x_pure_rand_e4m3` test * Add more results into hash value
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
* Fix mbarrier * Remove redundant store
-
- 09 Sep, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 01 Sep, 2025 3 commits
-
-
Chongchong Tian authored
Each thread is responsible for one target rank
-
fzyzcjy authored
* nits * hack unrolled warp copy * Revert "nits" This reverts commit 3e1b28d9b17f2c1cc46403d432ca576dbf15bd45.
-
fzyzcjy authored
-
- 28 Aug, 2025 1 commit
-
-
sky authored
* Fix: avoid floating point exception. Signed-off-by:
wangfakang <fakangwang@gmail.com> * simplify the code. Signed-off-by:
wangfakang <fakangwang@gmail.com> --------- Signed-off-by:
wangfakang <fakangwang@gmail.com>
-
- 26 Aug, 2025 1 commit
-
-
Zhiyi Hu authored
* fix combine timeout due to forwarder min head update * Update head before and after combine_token; add assertion for nvl_buffer_size_per_rdma_rank --------- Co-authored-by:zhiyi Hu <zhiyihu@U-NYQQMGK0-2250.local>
-
- 25 Aug, 2025 3 commits
-
-
fzyzcjy authored
-
Thunderbrook authored
-
sky authored
Signed-off-by:wangfakang <fakangwang@gmail.com>
-
- 19 Aug, 2025 1 commit
-
-
Tailing Yuan authored
Signed-off-by:Tailing Yuan <yuantailing@gmail.com>
-
- 14 Aug, 2025 1 commit
-
-
Yizhi Wang authored
-
- 11 Aug, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 08 Aug, 2025 2 commits
-
-
AlphaBaby authored
Co-authored-by:fujianhao.fjh <fujianhao.fjh@antgroup.com>
-
Shangyan Zhou authored
-
- 07 Aug, 2025 1 commit
-
-
Chenggang Zhao authored
-