"src/include/threadwise_2d_tensor_op.cuh" did not exist on "fee92fb636a7f1a6144a5358f22985502529160b"
added lds double buffer (on C dimension) for implicit gemm v1r3, as a result,...
added lds double buffer (on C dimension) for implicit gemm v1r3, as a result, it should achieve 90% of peak for all filter sizes, on CHWN format
Showing
Please register or sign in to comment