profiler/include/profile_batched_gemm_reduce_impl.hpp · c77ae65d40b1316dac02c4decf02d8517c840be2 · gaoqiong / composable_kernel

Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d

Qianfeng authored Apr 30, 2022

* [Experimental] Change to gemm+reduce and batched-gemm+reduce

* Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel

* Tiny fix in device_batched_gemm_xdl.hpp

* clang-format library/src/utility/conv_fwd_util.cpp

c77ae65d

profile_batched_gemm_reduce_impl.hpp 15 KB

Replace profile_batched_gemm_reduce_impl.hpp