composable_kernel/include/utility/reduction_common.hpp · b2dc55f82c635dc9a0a512ca3f476e1d825b0a8c · yangql / composable_kernel-1

[MIOpen Downstream] Fix Reduction Kernel (#34) · b2dc55f8

Qianfeng authored Oct 07, 2021



* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel

* Fix with regard to implementing GetZeroVal() in both kernel and host

* Avoid convert to compType from dstDataType before writting the output value

* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator

* Add CONSTANT decorator for descriptor read buffer

* Use get_thread_local_1d_id() for thread local Id

* Rename GetZeroVal() to GetReductionZeroVal() in the kernels

* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp

* Occasional tiny simplification and update in the kernel files

* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers

* Update to remove OpenCL tidy checking failures

* Update for better readability

* Remove unused codes and not-needed template parameters in the kernel wrappers
Co-authored-by: Chao Liu <chao.liu2@amd.com>

b2dc55f8

reduction_common.hpp 1.78 KB

Replace reduction_common.hpp