1. 30 Oct, 2023 1 commit
  2. 27 Oct, 2023 1 commit
    • carlushuang's avatar
      support batch & nhead, and scale (#20) · 95889861
      carlushuang authored
      * support batch & nhead
      
      * support scale
      
      * tile scheduler
      
      * rename tile-scheduler to tile-partitioner
      
      * add some exp2 math
      
      * fix a bug when chaning tile size
      95889861
  3. 19 Oct, 2023 3 commits
  4. 18 Oct, 2023 1 commit
    • Po Yen Chen's avatar
      Pre-compute coordinates to speed up store_tile() for TileWindowWithStaticDistribution<> (#12) · 63bc96e3
      Po Yen Chen authored
      
      
      * Extract store_tile() logics as method
      
      * Extract load_tile() logics as method
      
      * Rename type alias
      
      * Extract common logics as traits
      
      * Remove unnecessary access specifier
      
      * Add ComputeMode for TileWindowWithStaticDistribution
      
      * Put field check into Traits
      
      * More definition of Traits types
      
      * Use more clear static_assert() message
      
      * Enable pre-compute coordinates in store_tile()
      
      * Re-formate static assert
      
      * Undo changes to the wrong method
      
      * Enable pre-compute coords for store_tile()
      
      * Remove static_vector usage
      
      * Add method to move non-member coordinates
      
      * Force using pre-computed coordinates in Store()
      
      * Fix wrong access for SFC_Ys
      
      * Change comment
      
      * Allow users to hint # access per coord
      
      * Add comment for noting remove data members later
      
      * Unify FIXME comments
      
      * Replace FIXME comments by TODO
      
      * Let user specify HintNumCoords
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor load/store for window
      
      * clean
      
      * clean
      
      * bug fix for window; clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      63bc96e3
  5. 12 Oct, 2023 2 commits
    • Chao Liu's avatar
      Refactor 1010 (#14) · 7337ec25
      Chao Liu authored
      * refactor
      
      * refactor
      
      * change load_tile, update block gemm
      
      * debug
      
      * clean
      
      * clean
      
      * experiment lod
      
      * workaround spilling issue
      
      * clean
      7337ec25
    • carlushuang's avatar
      slice kv, and use 3d padding LDS layout (#15) · 7b1a0b7f
      carlushuang authored
      * slice kv, and use 3d padding LDS layout
      
      * add missing sync
      
      * put sync to another poace
      
      * move sync place
      
      * revert to normal
      7b1a0b7f
  6. 06 Oct, 2023 1 commit
    • carlushuang's avatar
      add tensor slicing API (#7) · 6491acda
      carlushuang authored
      
      
      * add tensor slicing API
      
      * remove redundant ck namespace
      
      * better gemm_gemm interface
      
      * modify gemm_gemm
      
      * add slice_tile api
      
      * fix merge bug
      
      * update to 3d padding, since we no longer need that much LDS size
      
      * clean
      
      * cleang
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6491acda
  7. 03 Oct, 2023 1 commit
    • Chao Liu's avatar
      Shuffle in thread (#13) · 1cf54e86
      Chao Liu authored
      * adding in-thread shuffle
      
      * update softmax example
      
      * refactor grid gemm
      
      * refactor gemm: layouts
      
      * bug fix
      
      * clean
      
      * clean
      1cf54e86
  8. 14 Sep, 2023 2 commits
  9. 13 Sep, 2023 1 commit
  10. 05 Sep, 2023 2 commits