Speed-up global memory reading for GEMM instances (#813)
* Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance
Showing
Please register or sign in to comment
* Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance