"vscode:/vscode.git/clone" did not exist on "59484a6fb482160c54d6d89d7324dc66c1d6fc79"
Use TMA to optimize internode combine. (#287)
* Let forwarders use a dedicated SM
* Shuffle rdma idx
* Sender use TMA.
* Adjust the tuning chunk size.
* Modify NVL chunk layout.
* Update some combine config.
* Small lint
* Minor fix
* Overlap TMA store
---------
Co-authored-by:
Chenggang Zhao <chenggangz@deepseek.com>
Showing
Please register or sign in to comment