• Qianfeng's avatar
    [Enhancements] Several bugfixes and refactoring of dynamic generic reduction (#1156) · dfb80c4e
    Qianfeng authored
    * Squashed 'src/composable_kernel/' content from commit f6edda61
    
    git-subtree-dir: src/composable_kernel
    git-subtree-split: f6edda61
    
    * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
    
    * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
    
    5781adf5 Update develop (#5) (#6)
    97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
    7b1ec41e refactor
    49c33aae refactor
    54b3e73d rename
    
    git-subtree-dir: src/composable_kernel
    git-subtree-split: 5781adf5
    
    
    
    * fix
    
    * refactor
    
    * remove online compilation from CK
    
    * refactor
    
    * fix
    
    * add ctest
    
    * tidy
    
    * add tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * tidy
    
    * add c-style pointer cast
    
    * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
    
    * fix clang warning suppression
    
    * tidy
    
    * suppress cppcheck
    
    * fix enum issue
    
    * revert chagnes to hip build
    
    * fix kernel filename
    
    * update CK build script
    
    * rename
    
    * rename
    
    * make innner product compatiable on gfx900
    
    * Update src/include/miopen/solver/ck_utility_common.hpp
    Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
    
    * compiler parameter use stream
    
    * use int instead of index_t in kernel wrapper
    
    * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
    
    * refactor
    
    * refactor
    
    * change cmakelist
    
    * change ck common utility
    
    * fix
    
    * Squashed 'src/composable_kernel/' changes from 5781adf5..31b40352
    
    31b40352 Merge pull request #16 from ROCmSoftwarePlatform/develop
    b62bf8c3 Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
    ccc4a1d3 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
    67ad47e7 refactor
    16effa76 refactor
    a91b68df DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
    2cbabbba use int instead of index_t in kernel wrapper
    0834bc76 compiler parameter use stream
    f2ac7832 make innner product compatiable on gfx900
    4e57b30a rename
    c03045ce rename
    b2589957 update CK build script
    2c48039d fix kernel filename
    d626dccc fix enum issue
    643ebd4f tidy
    ddd49ec9 fix clang warning suppression
    4f566c62 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
    172036d7 add c-style pointer cast
    76f31319 tidy
    d1842890 tidy
    f885c131 tidy
    80120f0a tidy
    c3efeb5e tidy
    56fc0842 tidy
    54fba515 tidy
    e62bae7a tidy
    24c87289 add tidy
    61487e0a fix
    ae98b52a remove online compilation from CK
    cb954213 refactor
    73ca9701 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
    3b866461 Merge pull request #7 from ROCmSoftwarePlatform/master
    d09ea4f4 Update develop (#5)
    3d32ae94 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
    
    git-subtree-dir: src/composable_kernel
    git-subtree-split: 31b40352
    
    
    
    * Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel
    
    * Fix with regard to implementing GetZeroVal() in both kernel and host
    
    * Avoid convert to compType from dstDataType before writting the output value
    
    * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator
    
    * Add CONSTANT decorator for descriptor read buffer
    
    * Use get_thread_local_1d_id() for thread local Id
    
    * Rename GetZeroVal() to GetReductionZeroVal() in the kernels
    
    * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp
    
    * Occasional tiny simplification and update in the kernel files
    
    * Update in src/reducetensor.cpp for consistent IDs passing to the kernel
    
    * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers
    
    * Update to remove OpenCL tidy checking failures
    
    * Small updates in src/reducetensor.cpp
    
    * Update for better readability
    
    * Remove unused codes and not-needed template parameters in the kernel wrappers
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
    dfb80c4e
reduction_common.hpp 1.78 KB