1. 12 May, 2022 1 commit
    • JD's avatar
      Add host API (#220) · cec69bc3
      JD authored
      
      
      * Add host API
      
      * manually rebase on develop
      
      * clean
      
      * manually rebase on develop
      
      * exclude tests from all target
      
      * address review comments
      
      * update client app name
      
      * fix missing lib name
      
      * clang-format update
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix test issue
      
      * refactor
      
      * refactor
      
      * refactor
      
      * upate cmake and readme
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      cec69bc3
  2. 05 Apr, 2022 1 commit
    • ltqin's avatar
      NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166) · 781cacd2
      ltqin authored
      * change backward weight name
      
      * start add bwd weight lib and profiler
      
      * change tuning paramter
      
      * change output info
      
      * add bwd weight test
      
      * change test info
      
      * using conv_util
      
      * change wgt to weight
      
      * add }
      
      * add fp32
      781cacd2
  3. 24 Mar, 2022 1 commit
    • Chao Liu's avatar
      Gemm+Reduce Fusion (#128) · f95267f1
      Chao Liu authored
      * add gridwise gemm v4r1
      
      * rename
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * use sfc in shuffling
      
      * remove hardcode
      
      * remove hardcode
      
      * refactor
      
      * fix build
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * format
      
      * clean
      
      * adding gemm+reduce
      
      * adding profiler for gemm+reduce
      
      * adding gemm+reduce profiler
      
      * fix build
      
      * clean up
      
      * gemm+reduce
      
      * fix build
      
      * update DeviceGemm_Xdl_CShuffle; update enum to enum class
      
      * clean up
      
      * add test for gemm+reduce
      
      * clean up
      
      * refactor
      
      * fix build
      
      * fix build
      f95267f1
  4. 07 Feb, 2022 1 commit
    • Chao Liu's avatar
      GEMM+Bias+ReLU+Add (#76) · 823657ed
      Chao Liu authored
      * tweak conv for odd C
      
      * update script
      
      * clean up elementwise op
      
      * fix build
      
      * clean up
      
      * added example for gemm+bias+relu+add
      
      * added example for gemm+bias+relu
      
      * add profiler for gemm_s_shuffle; re-org files
      
      * add profiler
      
      * fix build
      
      * clean up
      
      * clean up
      
      * clean up
      
      * fix build
      823657ed
  5. 26 Dec, 2021 1 commit
    • Chao Liu's avatar
      Fusion Conv+Bias+ReLU(+Add) (#62) · acbd7bd7
      Chao Liu authored
      * fix relu
      
      * clean up
      
      * clean up
      
      * adding 1x1 conv
      
      * adding 1x1 conv
      
      * added 1x1 conv
      
      * refactor
      
      * refactor
      
      * refactor
      
      * added profiler for conv+bias+relu+add
      
      * clean up
      
      * adding conv+bias+relu
      
      * adding conv+bias+relu
      
      * added conv+bias+relu
      
      * Update README.md
      
      * update cpu verification
      
      * adding c shuffle
      
      * update static_tensor for dealing with invalid element
      
      * adding c shuffle
      
      * debugging
      
      * fix bug
      
      * convert to fp16 before shuffle
      
      * shuffle more than one M/NRepeat
      
      * clean up
      
      * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1
      
      * clean up
      
      * remove coordinate step hack from all gridwise gemm xdl
      
      * clean up coordinate step hack
      
      * clean up coordinate step hack
      
      * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst
      
      * adding output shuffle in conv+bias+relu+add
      
      * update
      
      * added conv+bias+relu+add with c shuffle
      
      * added conv+bias+relu+add with c shuffle
      
      * fix forward_sweep bugs in threadwise copy
      
      * clean up
      
      * refactor
      
      * clean up
      
      * clean up
      
      * added conv_c_shuffle+bias_relu
      
      * clean up
      
      * added conv+bias+relu+atomic_add
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * misc fixes; add 1x1 specialization
      
      * clean up
      
      * delete unused device op
      
      * clean up
      
      * add support for odd C value
      acbd7bd7
  6. 30 Nov, 2021 1 commit
  7. 14 Nov, 2021 1 commit
    • Chao Liu's avatar
      ckProfiler and device-level XDL GEMM operator (#48) · e823d518
      Chao Liu authored
      * add DeviceGemmXdl
      
      * update script
      
      * fix naming issue
      
      * fix comment
      
      * output HostTensorDescriptor
      
      * rename
      
      * padded GEMM for fwd v4r4r4 nhwc
      
      * refactor
      
      * refactor
      
      * refactor
      
      * adding ckProfiler
      
      * adding ckProfiler
      
      * refactor
      
      * fix tuning parameter bug
      
      * add more gemm instances
      
      * add more fp16 GEMM instances
      
      * fix profiler driver
      
      * fix bug in tuning parameter
      
      * add fp32 gemm instances
      
      * small fix
      
      * refactor
      
      * rename
      
      * refactor gemm profiler; adding DeviceConv and conv profiler
      
      * refactor
      
      * fix
      
      * add conv profiler
      
      * refactor
      
      * adding more GEMM and Conv instance
      
      * Create README.md
      
      Add build instruction for ckProfiler
      
      * Create README.md
      
      Add Readme for gemm_xdl example
      
      * Update README.md
      
      Remove build instruction from top most folder
      
      * Update README.md
      
      * clean up
      e823d518