• carlushuang's avatar
    Fmha pr 2 (#26) · 3753c4bc
    carlushuang authored
    * support hdim=64/128 in same example code
    
    * support v transpose
    
    * revert gemm.cpp, not intent to modify it
    
    * remove useless code
    
    * fix a bug for swizzle C encoding, no perf change
    
    * optimize LDS encoding
    
    * update LDS layout
    
    * clean up code
    3753c4bc
fmha_fwd_kernel.hpp 9.08 KB