added implicit gemm v1r3, refactored decomposition of wei tensor (loop over y,...
added implicit gemm v1r3, refactored decomposition of wei tensor (loop over y, x first, and C second) to allow easy lds double buffer on C
Showing
Please register or sign in to comment