1. 15 Nov, 2023 1 commit
    • carlushuang's avatar
      Fmha pr 2 (#26) · 3753c4bc
      carlushuang authored
      * support hdim=64/128 in same example code
      
      * support v transpose
      
      * revert gemm.cpp, not intent to modify it
      
      * remove useless code
      
      * fix a bug for swizzle C encoding, no perf change
      
      * optimize LDS encoding
      
      * update LDS layout
      
      * clean up code
      3753c4bc
  2. 03 Nov, 2023 1 commit
  3. 27 Oct, 2023 1 commit
    • carlushuang's avatar
      support batch & nhead, and scale (#20) · 95889861
      carlushuang authored
      * support batch & nhead
      
      * support scale
      
      * tile scheduler
      
      * rename tile-scheduler to tile-partitioner
      
      * add some exp2 math
      
      * fix a bug when chaning tile size
      95889861
  4. 19 Oct, 2023 1 commit
    • carlushuang's avatar
      add fmha fwd pipeline (#17) · 9f36ac7c
      carlushuang authored
      
      
      * Revert "Extract gemm0 prefetch0 out from loop"
      
      This reverts commit d3b56f39f9fd12edb476b24ae9cf480841d311e4.
      
      * add fmha fwd  pipeline
      
      * Extract gemm0 prefetch0 out from loop
      
      * move blockSize to another place ; fix a missing header in tile_window_impl_static_distribution.hpp
      
      * remove KArgs from tile modules
      
      ---------
      Co-authored-by: default avatarPo-Yen, Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      9f36ac7c