profiler/include/profile_gemm_reduce_impl.hpp · 0f912e205eec6e349060f2203a8eeabc5e7ba075 · yangql / composable_kernel-1

Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d

Qianfeng authored Apr 30, 2022

* [Experimental] Change to gemm+reduce and batched-gemm+reduce

* Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel

* Tiny fix in device_batched_gemm_xdl.hpp

* clang-format library/src/utility/conv_fwd_util.cpp

c77ae65d

profile_gemm_reduce_impl.hpp 13.6 KB

Replace profile_gemm_reduce_impl.hpp