"vscode:/vscode.git/clone" did not exist on "b97ff259db510ce2e24e5fd9334a884e79d4da7b"
Use TMA instead of LD/ST for intra-node normal kernels (#191)
* Update CMake files * Use TMA instead of LD/ST for intranode dispatch * Use TMA instead of LD/ST for intranode combine * Adjust configs * Test default configs as well * More warps for combine * Add inter-thread fence * Enable more warps * Do not use TMA for senders * Update configs * Remove useless wait
Showing
Please register or sign in to comment