• Anthony Chang's avatar
    Fused GEMM+GEMM (#351) · c20a75b0
    Anthony Chang authored
    
    
    * initial stub for gemm_gemm_xdl_cshuffle
    
    * set up example code
    
    * compiles
    
    * prevent integer overflow
    
    * harmonize interface between ref_gemm and ref_batched_gemm
    
    * batched_gemm_gemm
    
    * fix example
    
    * host tensor gen: diagonal pattern in lowest two-dimensions only
    
    * make c descriptors containing only integral constants
    
    * clean up
    
    * add BlockwiseGemmXdlops_v2 while exploring an unified approach
    
    * implement proper interface
    
    * tidy up example
    
    * fix compilation warnings
    
    * coarsely controlled 2nd gemm padding
    
    * remove rocm-cmake's hard requirement for certain revision
    
    * clang-format
    
    * resolve merge conflict
    
    * fix compilation error on gfx10
    
    * adds acc0 elementwise op to interface
    
    * add gemm_gemm instances and tests
    
    * avoid LDS data hazard
    
    * fix build
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    c20a75b0
CMakeLists.txt 1.85 KB