- 10 Jun, 2025 3 commits
-
-
Shangyan Zhou authored
remove the dependency of gdrcopy
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 09 Jun, 2025 4 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add low-latency kernel usage flag * Update comments
-
Chenggang Zhao authored
-
- 08 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
Allow using MNNVL
-
- 07 Jun, 2025 1 commit
-
-
fzyzcjy authored
-
- 06 Jun, 2025 2 commits
-
-
Chenggang Zhao authored
* Update CMake files * Use TMA instead of LD/ST for intranode dispatch * Use TMA instead of LD/ST for intranode combine * Adjust configs * Test default configs as well * More warps for combine * Add inter-thread fence * Enable more warps * Do not use TMA for senders * Update configs * Remove useless wait
-
Shangyan Zhou authored
Co-authored-by:Shangyan Zhou <sy.zhou@deepseek.com>
-
- 05 Jun, 2025 3 commits
-
-
Chenggang Zhao authored
-
Shangyan Zhou authored
-
Shangyan Zhou authored
Fix notify_dispatch: using warp 0 to issue send
-
- 03 Jun, 2025 1 commit
-
-
wzc.wuzhicheng authored
Signed-off-by:wzc.wuzhicheng <wzc.wuzhicheng@linux.alibaba.com>
-
- 28 May, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 23 May, 2025 4 commits
-
-
Chenggang Zhao authored
-
Shangyan Zhou authored
Low-latency P2P code cleanup and bug fixed
-
Chenggang Zhao authored
-
cywork121 authored
-
- 19 May, 2025 1 commit
-
-
guyueh1 authored
* Add 10.0 to TORCH_CUDA_ARCH_LIST Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * Revert csrc/CMakeLists change; in setup.py make TORCH_CUDA_ARCH_LIST configurable Signed-off-by:
Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by:
Guyue Huang <guyueh@nvidia.com>
-
- 12 May, 2025 4 commits
-
-
Chenggang Zhao authored
Support hidden size 4096
-
sleepcoo authored
Co-authored-by:
zhyncs <me@zhyncs.com> Co-authored-by:
yinfan98 <1106310035@qq.com>
-
Shangyan Zhou authored
Feat: enhance nvidia peer memory detection
-
Chenggang Zhao authored
Shuffling the starting index of target rank for different ranks and channels
-
- 10 May, 2025 1 commit
-
-
wangfakang authored
To mitigate incast congestion, shuffle the starting index of target rank for different ranks and channels Signed-off-by:wangfakang <fakangwang@gmail.com>
-
- 09 May, 2025 1 commit
-
-
Vico Chu authored
-
- 08 May, 2025 2 commits
-
-
Chenggang Zhao authored
Fix DeepEP cannot be used together with code that needs GIL such as Mooncake transfer engine
-
fzyzcjy authored
-
- 29 Apr, 2025 1 commit
-
-
fzyzcjy authored
-
- 27 Apr, 2025 2 commits
-
-
Shangyan Zhou authored
Add Infrawaves' fork to README.
-
Shangyan Zhou authored
-
- 22 Apr, 2025 6 commits
-
-
Chenggang Zhao authored
Support multi-QP for normal kernels
-
Shangyan Zhou authored
-
Shangyan Zhou authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
-
Shangyan Zhou authored
-
- 21 Apr, 2025 2 commits
-
-
moningchen authored
Merge branch 'trmt/internode_multi_qp' of github.com:deepseek-ai/DeepEP into trmt/internode_multi_qp
-
moningchen authored
-