1. 29 Sep, 2023 1 commit
  2. 28 Sep, 2023 1 commit
  3. 26 Sep, 2023 2 commits
  4. 26 Jul, 2023 1 commit
    • carlushuang's avatar
      initial stream-k implementation with example (#699) · e7dca79d
      carlushuang authored
      
      
      * initial stream-k implementation with example
      
      * fix unexpected change in err
      
      * improve a little bit performance by reorganize pipeline.
      
      * improve perf a little bit by swizzle block idx
      
      * add profiler
      
      * update example
      
      * fix spelling
      
      * shrink karg for streamk
      
      * support dynamic buffer using memory coherence glc_slc bit from template
      
      * control memory coherence while construct dynamic buffer
      
      * update reduction for streamk(not ready yet)
      
      * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
      
      * fix build issue
      
      * fix several bug
      
      * now result is correct, everything works (but has scratch)
      
      * remove scratch by manually reset coordinate
      
      * update device code
      
      * fix a bug in final reduce
      
      * fix something in example
      
      * update async memset
      
      * fix enum as camel case
      
      * modify coherence enum name
      
      * clean code and use atomic streamk by default
      
      * remove unused var
      
      * throw exception if have empty pointer
      
      * fix format
      
      * fix CI warning
      
      * fix type in init
      
      * modify CI error
      
      * filter out on gfx10+
      
      * restore changed example code
      
      ---------
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      e7dca79d
  5. 31 May, 2023 1 commit
  6. 25 Jun, 2022 2 commits
    • Chao Liu's avatar
      add license in file (#303) · d3051d75
      Chao Liu authored
      d3051d75
    • Chao Liu's avatar
      Absolute include path (#281) · d1db6a0c
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add client app example
      
      * update readme
      
      * delete obselete files
      
      * remove old client app
      
      * delete old file
      
      * cleaning
      
      * clean
      
      * remove half
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path for all examples
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * revert client app example
      
      * clean build
      
      * fix build
      
      * temporary disable client test on Jenkins
      
      * clean
      
      * clean
      
      * clean
      d1db6a0c
  7. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  8. 04 Mar, 2022 1 commit
    • ltqin's avatar
      NHWC conv 2d: bwd fp32/fp16/bfp16/int8, Device level tuning and host API (#92) · c254e5ab
      ltqin authored
      
      
      * start conv2d bwd api
      
      * kernel running
      
      * add bwd reference
      
      * change to no shuffle
      
      * fix bwd reference
      
      * pass verification
      
      * add Filter1x1Stride1Pad0 and start testing
      
      * change some tuning parameter
      
      * fix test error
      
      * add fp16 tuning parameter
      
      * add bf16 tuning parameter
      
      * add int8 tuning parameters
      
      * change fp32 tuning parameter
      
      * add bwd to profiler
      
      * fix bug for bwd profiler
      
      * fix ckProfiler bug
      
      * change conv2d_bwd_xdl to fp16
      
      * fix bug in comments
      
      * fix precompile id
      
      * fix enum conv name
      
      * chage _bwd_ to _bwd_data_
      
      * change conv2d_bwd example id
      
      * bwd to bwd data
      
      * fix prehead
      
      * fix MakeDefaultBlock2CTileMap ,import form merge develop
      
      * format bwd instance
      
      * bwd to bwd data
      
      * change name bwd to bwd data
      
      * change name bwd to bwd data in example
      
      * formate code
      
      * change conv2d bwd data id in example
      
      * rewrite readme for example
      
      * fix CalculateMagicNumbers about div zero
      
      * add workaround CK_WORKAROUND_SWDEV_325164
      
      * change test_conf2d_bwd_data show info
      
      * format
      
      * fix bug for workaround:CK_WORKAROUND_SWDEV_325164
      
      * formate tuning parameters
      
      * formate tuning parameters again
      
      * formate tuning parameters 3
      
      * formate tuning parameters 4
      
      * remove add function template
      
      * format
      
      * update comment
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      c254e5ab
  9. 23 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  10. 18 Nov, 2021 1 commit
  11. 23 Aug, 2021 1 commit
  12. 19 Aug, 2021 1 commit
    • Chao Liu's avatar
      Composable kernel init integration v3 (#1097) · 6fe3627a
      Chao Liu authored
      * Squashed 'src/composable_kernel/' content from commit f6edda61
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: f6edda61
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
      
      5781adf5 Update develop (#5) (#6)
      97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
      7b1ec41e refactor
      49c33aae refactor
      54b3e73d rename
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: 5781adf5
      
      
      
      * fix
      
      * refactor
      
      * remove online compilation from CK
      
      * refactor
      
      * fix
      
      * add ctest
      
      * add c-style pointer cast
      
      * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
      
      * fix clang warning suppression
      
      * tidy
      
      * suppress cppcheck
      
      * fix enum issue
      
      * revert chagnes to hip build
      
      * fix kernel filename
      
      * update CK build script
      
      * rename
      
      * rename
      
      * make innner product compatiable on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * refactor
      
      * refactor
      
      * change cmakelist
      
      * change ck common utility
      
      * fix
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      6fe3627a
  13. 27 Jul, 2021 1 commit
    • Chao Liu's avatar
      [MIOpen Downstream] Initial MIOpen integration (#52) · f63a23ac
      Chao Liu authored
      * update online kernel wrapper bundle all descriptors in a tuple
      
      * change __CONSTANT__ to CONSTANT
      
      * rename
      
      * adding tuning
      
      * added IsValidCompileParameter
      
      * reorginze
      
      * adding tunable for fp16 and int8
      
      * fix kernel compile warning and bug fixes
      
      * suppress warning about cast CONSTANT (address space 4) pointer
      
      * fix building issue
      f63a23ac
  14. 10 Jun, 2021 1 commit
  15. 13 Apr, 2021 1 commit