Optimize intranode combine. (#247)
* Increase the test round. * Add warp synchronization. * Shuffle the send warps. * Add time elapsed into bench result.
Showing
Please register or sign in to comment
* Increase the test round. * Add warp synchronization. * Shuffle the send warps. * Add time elapsed into bench result.