1. change blockwise gemm loopover direction from kmn to mnk ( ~1% improvement)
2. change kernel timing mode to 50 warmup + 50 timed repeat
Showing
Please register or sign in to comment
2. change kernel timing mode to 50 warmup + 50 timed repeat