- 02 Jul, 2025 12 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
fzyzcjy authored
* more * more * more * more * more * more
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
fzyzcjy authored
-
fzyzcjy authored
-
Chenggang Zhao authored
-
ruizhang1230 authored
* support hidden size 8192 * refactor code * fix assert
-
Zhiyi Hu authored
Co-authored-by:zhiyi Hu <zhiyihu@U-NYQQMGK0-2250.local>
-
- 30 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
enhance warp copy
-
- 27 Jun, 2025 7 commits
-
-
alpha-baby authored
-
alpha-baby authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
* Remove memory fence in NVLink barrier. * Move `__syncthread` and fence into barrier. * Fix bugs --------- Co-authored-by:Chenggang Zhao <chenggangz@deepseek.com>
-
Shangyan Zhou authored
-
Shangyan Zhou authored
-
Chenggang Zhao authored
-
- 26 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
-
- 25 Jun, 2025 2 commits
-
-
Shangyan Zhou authored
* Support bias. * Fix. * Fix style.
-
Shangyan Zhou authored
* Add `get_comm_stream`. * Fix style.
-
- 24 Jun, 2025 3 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add draft * Add fast-debugging flags * Fix several bugs * Add sender timeout checks * Fix stuck * Fix bugs * Fix bugs
-
Shangyan Zhou authored
* Increase the test round. * Add warp synchronization. * Shuffle the send warps. * Add time elapsed into bench result.
-
- 23 Jun, 2025 2 commits
- 20 Jun, 2025 1 commit
-
-
Chenggang Zhao authored
-
- 18 Jun, 2025 6 commits
-
-
Chenggang Zhao authored
-
Chenggang Zhao authored
-
Shangyan Zhou authored
Set `device_id` to suppress pytorch warning.
-
Shangyan Zhou authored
-
Shangyan Zhou authored
-
Shangyan Zhou authored
* Fix the tail loading issue. * Modify the sync offset.
-
- 16 Jun, 2025 4 commits
-
-
Shangyan Zhou authored
* Fix warp synchronization. * Another fix.
-
Chenggang Zhao authored
-
Chenggang Zhao authored
* Add automatic warp count control for low-latency dispatch * Add automatic warp count control for low-latency combine * More assertions
-
fzyzcjy authored
-
- 13 Jun, 2025 1 commit
-
-
Shangyan Zhou authored
-