• ltqin's avatar
    Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d
    ltqin authored
    
    
    * init for reduce_threadwise multi_d
    
    * add reduce_threadwise_multi_d
    
    * add reduce_multi_d
    
    * clean
    
    * start add an other splitk device op
    
    * add reduce template parameter to SplitKBatchOffset
    
    * add reduce c matrix
    
    * clean up code
    
    * change example data type to bf16
    
    * add bf16Ai8B example
    
    * remove reduce template parameter
    
    * add splitk atomic status to v4
    
    * example add multi d parameters
    
    * device op add multi-d parameters
    
    * add multi-d to reduce
    
    * fix kbach=1 bug
    
    * change B layout to col in  bf16Ai8B example
    
    * remove float adding struct
    
    * change  multi-d interface
    
    * change file and class name
    
    * remove multi-d of bf16Ai8B example
    
    * change IsReduce function to IsReduceAdd
    
    * change example layout to RRR from RCR
    
    * according layout to set ds stride
    
    * reset parameter layout
    
    * add gemm universal reduce instance
    
    * add reduce factory
    
    * add profile_gemm_universal_reduce
    
    * add reduce to profiler
    
    * fix reduce instance
    
    * fix profiler reduce compiling bug
    
    * format
    
    * format library instance code
    
    * add mem instance for reduce library
    
    * fix call instance names
    
    * add workspace for reduce in ckProfiler
    
    * format
    
    * add mnpading to reduce library instance
    
    * add fp16 instance to reduce of profiler
    
    * change copyright time
    
    * restore profiler cmake file
    
    * add reduce text to instances
    
    * add DsLayout and DsDataType to instances template parameter
    
    * fixed gemm_reduce_multi_d
    
    * add an example without multi_d
    
    * Update common.hpp
    
    * Update gtest.cmake
    
    * Update gemm_xdl_splitk_reduce_bf16.cpp
    
    * clean
    
    * Update gtest.cmake
    
    * format
    
    * fixe api
    
    * format
    
    * default parameter change to RRR
    
    * add vector_len for multi_d
    
    * format
    
    * Update gtest.cmake
    
    * fix bf16A iBB elementwiseop
    
    * add ReduceDataType
    
    * move ReduceDataType to end position
    
    * format
    
    * remove googletest git method  address
    
    * fix copyright time
    
    * update init data
    
    ---------
    Co-authored-by: default avatarroot <jizhan@amd.com>
    Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
    Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    c544eb4d
profile_gemm_universal_reduce.cpp 5.42 KB