Optimize low latency combine send with TMA (#299)
* feat: low latency combine inplace TMA * optimize tma pointer with PatternVisitor * Minor cleanup * Add `elect_one_sync` --------- Co-authored-by:Zhean Xu <xza@deepseek.com> Co-authored-by:
Chenggang Zhao <chenggangz@deepseek.com>
Showing
Please register or sign in to comment