[CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations
Showing
Please register or sign in to comment
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations