-
ClementLinCF authored
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations
0b8f117f
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations