1. 20 May, 2022 1 commit
    • rocking5566's avatar
      Gemm reduce max (#209) · 0ffe956a
      rocking5566 authored
      
      
      * [What] Rename the example
      [Why] Prepare to add unary reduction
      
      * Add global oparation to the parameter
      
      * Add atomicmax
      
      * Fix compile error
      
      * Support atomicMax (hip library)
      
      * Rename the reduction example
      
      * Fix target name
      
      * use p_d1_grid as the indicator directly
      
      * Prevent performance issue. Let passthrough handle it.
      
      * Implement the function template the specialize the float2
      
      * No need to separate into two lines
      
      * Remove empty line
      
      * add comment
      
      * Fix compile error due to merge from develop
      
      * make the implementation of atomic_max / atomic_add explicit for each datatype
      
      * Refine typo
      
      * For future CI test
      
      * Fix compiler error in ckProfiler
      
      * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'
      
      * simply use remove_pointer
      
      * Rename type and var
      
      * Refine example
      
      * Modify reducemax example
      
      * Fix bug in reduction
      
      * Change initialize range
      
      * Implement F64 version of atomicMax
      
      * Move reduction  code together
      
      * Add buffer atomic_max
      
      * Fix coding style by clang-format
      
      * Integrate new api of DeviceGemmReduce_Xdl_CShuffle
      
      * Integrate Batch gemm reduction
      
      * Fix example
      
      * fix example
      
      * clean up
      
      * Fix batch gemm tensor operation
      
      * Fix coding style
      
      * Fix template augument
      
      * Fix clang format
      
      * Keep flexible of different stride for each D tensor
      
      * Fix compile error for ckProfiler
      
      * Fix typo
      
      * [What] Fix naming
      [Why] Prepare to add out elementop
      
      * Add DoutElementOp
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      0ffe956a
  2. 13 May, 2022 1 commit
  3. 12 May, 2022 1 commit
    • JD's avatar
      Add host API (#220) · cec69bc3
      JD authored
      
      
      * Add host API
      
      * manually rebase on develop
      
      * clean
      
      * manually rebase on develop
      
      * exclude tests from all target
      
      * address review comments
      
      * update client app name
      
      * fix missing lib name
      
      * clang-format update
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix test issue
      
      * refactor
      
      * refactor
      
      * refactor
      
      * upate cmake and readme
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      cec69bc3
  4. 29 Apr, 2022 1 commit
    • Qianfeng's avatar
      Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d
      Qianfeng authored
      * [Experimental] Change to gemm+reduce and batched-gemm+reduce
      
      * Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel
      
      * Tiny fix in device_batched_gemm_xdl.hpp
      
      * clang-format library/src/utility/conv_fwd_util.cpp
      c77ae65d
  5. 31 Mar, 2022 1 commit
    • Chao Liu's avatar
      Compile for gfx908 and gfx90a (#130) · cd167e49
      Chao Liu authored
      * adding compilation for multiple targets
      
      * fix build
      
      * clean
      
      * update Jekinsfile
      
      * update readme
      
      * update Jenkins
      
      * use ck::half_t instead of ushort for bf16
      
      * rename enum classes
      
      * clean
      
      * rename
      
      * clean
      cd167e49
  6. 30 Mar, 2022 1 commit
    • Jianfeng Yan's avatar
      Batched gemm and reduction (#156) · 34c661e7
      Jianfeng Yan authored
      * adding batched_gemm_and_reduction
      
      * batched_gemm_reduce works with bactch_count=1
      
      * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
      
      * adding profiler for batched_gemm_fp16
      
      * fixed a bug in declaration of d1 and d0; both example and profiler work
      
      * clang-format
      
      * cleanup
      
      * batched_gemm_reduce: add test
      
      * minor change
      
      * fixed some typo in function names
      34c661e7
  7. 24 Mar, 2022 1 commit
    • Chao Liu's avatar
      Gemm+Reduce Fusion (#128) · f95267f1
      Chao Liu authored
      * add gridwise gemm v4r1
      
      * rename
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * use sfc in shuffling
      
      * remove hardcode
      
      * remove hardcode
      
      * refactor
      
      * fix build
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * format
      
      * clean
      
      * adding gemm+reduce
      
      * adding profiler for gemm+reduce
      
      * adding gemm+reduce profiler
      
      * fix build
      
      * clean up
      
      * gemm+reduce
      
      * fix build
      
      * update DeviceGemm_Xdl_CShuffle; update enum to enum class
      
      * clean up
      
      * add test for gemm+reduce
      
      * clean up
      
      * refactor
      
      * fix build
      
      * fix build
      f95267f1