• rocking5566's avatar
    Gemm reduce max (#209) · 0ffe956a
    rocking5566 authored
    
    
    * [What] Rename the example
    [Why] Prepare to add unary reduction
    
    * Add global oparation to the parameter
    
    * Add atomicmax
    
    * Fix compile error
    
    * Support atomicMax (hip library)
    
    * Rename the reduction example
    
    * Fix target name
    
    * use p_d1_grid as the indicator directly
    
    * Prevent performance issue. Let passthrough handle it.
    
    * Implement the function template the specialize the float2
    
    * No need to separate into two lines
    
    * Remove empty line
    
    * add comment
    
    * Fix compile error due to merge from develop
    
    * make the implementation of atomic_max / atomic_add explicit for each datatype
    
    * Refine typo
    
    * For future CI test
    
    * Fix compiler error in ckProfiler
    
    * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'
    
    * simply use remove_pointer
    
    * Rename type and var
    
    * Refine example
    
    * Modify reducemax example
    
    * Fix bug in reduction
    
    * Change initialize range
    
    * Implement F64 version of atomicMax
    
    * Move reduction  code together
    
    * Add buffer atomic_max
    
    * Fix coding style by clang-format
    
    * Integrate new api of DeviceGemmReduce_Xdl_CShuffle
    
    * Integrate Batch gemm reduction
    
    * Fix example
    
    * fix example
    
    * clean up
    
    * Fix batch gemm tensor operation
    
    * Fix coding style
    
    * Fix template augument
    
    * Fix clang format
    
    * Keep flexible of different stride for each D tensor
    
    * Fix compile error for ckProfiler
    
    * Fix typo
    
    * [What] Fix naming
    [Why] Prepare to add out elementop
    
    * Add DoutElementOp
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    Co-authored-by: default avatarrocking <chunylai@amd.com>
    0ffe956a
amd_buffer_addressing.hpp 50.7 KB