1. 12 May, 2022 1 commit
    • JD's avatar
      Add host API (#220) · cec69bc3
      JD authored
      
      
      * Add host API
      
      * manually rebase on develop
      
      * clean
      
      * manually rebase on develop
      
      * exclude tests from all target
      
      * address review comments
      
      * update client app name
      
      * fix missing lib name
      
      * clang-format update
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix test issue
      
      * refactor
      
      * refactor
      
      * refactor
      
      * upate cmake and readme
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      cec69bc3
  2. 30 Mar, 2022 1 commit
    • Jianfeng Yan's avatar
      Batched gemm and reduction (#156) · 34c661e7
      Jianfeng Yan authored
      * adding batched_gemm_and_reduction
      
      * batched_gemm_reduce works with bactch_count=1
      
      * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
      
      * adding profiler for batched_gemm_fp16
      
      * fixed a bug in declaration of d1 and d0; both example and profiler work
      
      * clang-format
      
      * cleanup
      
      * batched_gemm_reduce: add test
      
      * minor change
      
      * fixed some typo in function names
      34c661e7
  3. 24 Mar, 2022 1 commit
    • Chao Liu's avatar
      Gemm+Reduce Fusion (#128) · f95267f1
      Chao Liu authored
      * add gridwise gemm v4r1
      
      * rename
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * use sfc in shuffling
      
      * remove hardcode
      
      * remove hardcode
      
      * refactor
      
      * fix build
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * format
      
      * clean
      
      * adding gemm+reduce
      
      * adding profiler for gemm+reduce
      
      * adding gemm+reduce profiler
      
      * fix build
      
      * clean up
      
      * gemm+reduce
      
      * fix build
      
      * update DeviceGemm_Xdl_CShuffle; update enum to enum class
      
      * clean up
      
      * add test for gemm+reduce
      
      * clean up
      
      * refactor
      
      * fix build
      
      * fix build
      f95267f1