• Haocong WANG's avatar
    [GEMM] gemm_universal related optimization (#1453) · 3049b546
    Haocong WANG authored
    
    
    * replace buffer_atomic with global_atomic
    
    * fixed global_atomic_add
    
    * added bf16 atomic_add
    
    * format
    
    * clang-format-12
    
    * clean
    
    * clean
    
    * add guards
    
    * Update gtest.cmake
    
    * enabled splitk_gemm_multi_d
    
    * format
    
    * add ckProfiler
    
    * format
    
    * fixed naming
    
    * format
    
    * clean
    
    * clean
    
    * add guards
    
    * fix clang format
    
    * format
    
    * add kbatch printout
    
    * clean
    
    * Add rocm6.2 related gemm optimization
    
    * Limit bf16 atomic usage
    
    * remove redundant RCR gemm_universal instance
    
    * Add RRR fp8 gemm universal instance
    
    * Bug fix
    
    * Add GPU_TARGET guard to FP8/BF8 target
    
    * bug fix
    
    * update cmake
    
    * remove all fp8/bf8 example if arch not support
    
    * Enable fp8 RRR support in ckProfiler
    
    * limit greedy-reverse flag to gemm_universal in ckProfiler
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
    Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
    Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
    3049b546
profile_gemm_universal.cpp 6.9 KB