1. 05 Sep, 2023 1 commit
  2. 31 May, 2023 1 commit
  3. 13 Aug, 2022 1 commit
    • Anthony Chang's avatar
      Fused attention (#345) · cac014f1
      Anthony Chang authored
      
      
      * initial stub for gemm_gemm_xdl_cshuffle
      
      * set up example code
      
      * compiles
      
      * prevent integer overflow
      
      * harmonize interface between ref_gemm and ref_batched_gemm
      
      * batched_gemm_gemm
      
      * fix example
      
      * host tensor gen: diagonal pattern in lowest two-dimensions only
      
      * make c descriptors containing only integral constants
      
      * clean up
      
      * add BlockwiseGemmXdlops_v2 while exploring an unified approach
      
      * implement proper interface
      
      * tidy up example
      
      * fix compilation warnings
      
      * coarsely controlled 2nd gemm padding
      
      * remove rocm-cmake's hard requirement for certain revision
      
      * clang-format
      
      * resolve merge conflict
      
      * fix compilation error on gfx10
      
      * adds acc0 elementwise op to interface
      
      * attention host validation
      
      * add blockwsie softmax v1
      
      * iteratively update softmax+gemm
      
      * transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum
      
      * add init method for easier debugging
      
      * do away with manual thread cluster calculation
      
      * generalize blockwise softmax interface
      
      * row-wise softmax sum & max
      
      * format
      
      * rename to DeviceBatchedGemmSoftmaxGemm
      
      * add gemm_softmax_gemm instances and tests
      
      * comment
      Co-authored-by: default avatarltqin <letao.qin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      cac014f1
  4. 25 Jun, 2022 1 commit
  5. 20 May, 2022 1 commit
    • Anthony Chang's avatar
      Refactor block to C tile map (#235) · a054f7d6
      Anthony Chang authored
      * refactor block-to-ctile-map
      
      * gridwise gemm block2ctile generic validity check
      
      * format
      
      * amend split-k gemm block2ctile map refactor
      
      * add test
      
      * format
      
      * amend
      
      * revert to calculating batch index in kernel instead of passing as block_id_z
      
      * move file
      
      * add valid ctile index check to gridwise v2r4
      a054f7d6
  6. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  7. 19 Aug, 2021 1 commit
    • Chao Liu's avatar
      Composable kernel init integration v3 (#1097) · 6fe3627a
      Chao Liu authored
      * Squashed 'src/composable_kernel/' content from commit f6edda61
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: f6edda61
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
      
      5781adf5 Update develop (#5) (#6)
      97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
      7b1ec41e refactor
      49c33aae refactor
      54b3e73d rename
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: 5781adf5
      
      
      
      * fix
      
      * refactor
      
      * remove online compilation from CK
      
      * refactor
      
      * fix
      
      * add ctest
      
      * add c-style pointer cast
      
      * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
      
      * fix clang warning suppression
      
      * tidy
      
      * suppress cppcheck
      
      * fix enum issue
      
      * revert chagnes to hip build
      
      * fix kernel filename
      
      * update CK build script
      
      * rename
      
      * rename
      
      * make innner product compatiable on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * refactor
      
      * refactor
      
      * change cmakelist
      
      * change ck common utility
      
      * fix
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      6fe3627a
  8. 10 Jun, 2021 1 commit
  9. 25 Mar, 2021 1 commit