"docs-source/vscode:/vscode.git/clone" did not exist on "58749d9d8400a9f5a77b972f2a7e7752e57008b5"
Canonicalize TMA usages (#410)
* Remove redundant TMA flushes * Less barrier initialization overhead * Simplify `elect_one_sync` * Use `elect_one_sync` instead of lanes * Minor fix * Polish testing prints * Refactor for internode kernels * Better performance
Showing
Please register or sign in to comment