"...composable_kernel_rocm.git" did not exist on "f95267f166927bee1d806cefbdc142b2e35f640f"
device_implicit_gemm_convolution_1_chwn_csrk_khwn: use tensor copy (instead of pointwise) for writing output, 3x3 increased from 78% to 84%, 5x5 from 80% to 84%