1. 18 Oct, 2023 1 commit
    • Po Yen Chen's avatar
      Pre-compute coordinates to speed up store_tile() for TileWindowWithStaticDistribution<> (#12) · 63bc96e3
      Po Yen Chen authored
      
      
      * Extract store_tile() logics as method
      
      * Extract load_tile() logics as method
      
      * Rename type alias
      
      * Extract common logics as traits
      
      * Remove unnecessary access specifier
      
      * Add ComputeMode for TileWindowWithStaticDistribution
      
      * Put field check into Traits
      
      * More definition of Traits types
      
      * Use more clear static_assert() message
      
      * Enable pre-compute coordinates in store_tile()
      
      * Re-formate static assert
      
      * Undo changes to the wrong method
      
      * Enable pre-compute coords for store_tile()
      
      * Remove static_vector usage
      
      * Add method to move non-member coordinates
      
      * Force using pre-computed coordinates in Store()
      
      * Fix wrong access for SFC_Ys
      
      * Change comment
      
      * Allow users to hint # access per coord
      
      * Add comment for noting remove data members later
      
      * Unify FIXME comments
      
      * Replace FIXME comments by TODO
      
      * Let user specify HintNumCoords
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor load/store for window
      
      * clean
      
      * clean
      
      * bug fix for window; clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      63bc96e3
  2. 12 Oct, 2023 1 commit
    • Chao Liu's avatar
      Refactor 1010 (#14) · 7337ec25
      Chao Liu authored
      * refactor
      
      * refactor
      
      * change load_tile, update block gemm
      
      * debug
      
      * clean
      
      * clean
      
      * experiment lod
      
      * workaround spilling issue
      
      * clean
      7337ec25
  3. 06 Oct, 2023 1 commit
    • carlushuang's avatar
      add tensor slicing API (#7) · 6491acda
      carlushuang authored
      
      
      * add tensor slicing API
      
      * remove redundant ck namespace
      
      * better gemm_gemm interface
      
      * modify gemm_gemm
      
      * add slice_tile api
      
      * fix merge bug
      
      * update to 3d padding, since we no longer need that much LDS size
      
      * clean
      
      * cleang
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6491acda
  4. 03 Oct, 2023 1 commit
    • Chao Liu's avatar
      Shuffle in thread (#13) · 1cf54e86
      Chao Liu authored
      * adding in-thread shuffle
      
      * update softmax example
      
      * refactor grid gemm
      
      * refactor gemm: layouts
      
      * bug fix
      
      * clean
      
      * clean
      1cf54e86
  5. 14 Sep, 2023 2 commits
  6. 13 Sep, 2023 1 commit
  7. 07 Sep, 2023 1 commit
  8. 05 Sep, 2023 1 commit