• Qianfeng's avatar
    [MIOpen Downstream] Fix Reduction Kernel (#34) · b2dc55f8
    Qianfeng authored
    
    
    * Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel
    
    * Fix with regard to implementing GetZeroVal() in both kernel and host
    
    * Avoid convert to compType from dstDataType before writting the output value
    
    * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator
    
    * Add CONSTANT decorator for descriptor read buffer
    
    * Use get_thread_local_1d_id() for thread local Id
    
    * Rename GetZeroVal() to GetReductionZeroVal() in the kernels
    
    * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp
    
    * Occasional tiny simplification and update in the kernel files
    
    * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers
    
    * Update to remove OpenCL tidy checking failures
    
    * Update for better readability
    
    * Remove unused codes and not-needed template parameters in the kernel wrappers
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    b2dc55f8
reduction_common.hpp 1.78 KB