- 31 Jul, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 30 Jul, 2025 1 commit
-
-
sky authored
* Add diagnosis module for precise identification of slow ranks Signed-off-by:
wangfakang <fakangwang@gmail.com> * Add tests for the slow diagnosis module Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update some comments for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Update test case for diagnose Signed-off-by:
wangfakang <fakangwang@gmail.com> * Strip the diagnose module, thx LyricZhao and sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * update variable name and cumulative wait recv cost, thx sphish. Signed-off-by:
wangfakang <fakangwang@gmail.com> * remove invalid comments. Signed-off-by:
wangfakang <fakangwang@gmail.com> --------- Signed-off-by:
wangfakang <fakangwang@gmail.com>
-
- 14 Jul, 2025 1 commit
-
-
Zhean Xu authored
* feat: low latency combine inplace TMA * optimize tma pointer with PatternVisitor * Minor cleanup * Add `elect_one_sync` --------- Co-authored-by:
Zhean Xu <xza@deepseek.com> Co-authored-by:
Chenggang Zhao <chenggangz@deepseek.com>
-
- 10 Jul, 2025 1 commit
-
-
Chenggang Zhao authored
* Add LogFMT interface * Update comments * Add simulated code * Fix comments * Change to 128 channels * Add notes * Optimize performance * optimize simulate logfmt 10bit * Minor fix * Stronger low latency tests * Minor fix * Stronger low latency tests for logfmt * Optimize logfmt simulate: lg2/ex2 ptx, step_inv * Minor fix * Minor fix * Add non-logfmt bench * Fix value=0 for logfmt * Optimize performance * Refactor tests --------- Co-authored-by:Zhean Xu <xza@deepseek.com>
-
- 02 Jul, 2025 2 commits
-
-
Chenggang Zhao authored
-
ruizhang1230 authored
* support hidden size 8192 * refactor code * fix assert
-
- 23 Jun, 2025 1 commit
-
-
fzyzcjy authored
-
- 16 Jun, 2025 2 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add automatic warp count control for low-latency dispatch * Add automatic warp count control for low-latency combine * More assertions
-
- 12 Jun, 2025 1 commit
-
-
Shifang Xu authored
-
- 10 Jun, 2025 1 commit
-
-
Chenggang Zhao authored
* Fully remove FIFO slots * Fully remove FIFO buffers * Minor fix styles * Fix some typos * Bugs fixed * Cleanup `ibgda_poll_cq`
-
- 09 Jun, 2025 2 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add low-latency kernel usage flag * Update comments
-
- 23 May, 2025 2 commits
-
-
Chenggang Zhao authored
-
cywork121 authored
-
- 22 Apr, 2025 2 commits
-
-
Shangyan Zhou authored
-
Shangyan Zhou authored
-
- 07 Apr, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 27 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 18 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 14 Mar, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 10 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 06 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 03 Mar, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 27 Feb, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 25 Feb, 2025 1 commit
-
-
Chenggang Zhao authored
-