• Qianfeng's avatar
    Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d
    Qianfeng authored
    * [Experimental] Change to gemm+reduce and batched-gemm+reduce
    
    * Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel
    
    * Tiny fix in device_batched_gemm_xdl.hpp
    
    * clang-format library/src/utility/conv_fwd_util.cpp
    c77ae65d
profile_batched_gemm_reduce_impl.hpp 15 KB