1. 08 Jun, 2023 1 commit
  2. 31 May, 2023 1 commit
  3. 02 Aug, 2022 1 commit
    • Adam Osewski's avatar
      CGEMM examples bf16, fp32, int8 (#332) · fb0dc358
      Adam Osewski authored
      
      
      * Add int8 specialization for elementwise Add and Subtract.
      
      * CGEMM examples bf16, fp32, int8
      
      * Add convert reference output to CDataType.
      
      * Skip BF16 data type during testing.
      
      * Lower K value to get rid of accumulation error.
      
      * Fix merge artifact.
      
      * Fix changed function name: GetElementSpaceSize()
      
      * Fix merge artifact.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      fb0dc358
  4. 25 Jun, 2022 2 commits
    • Chao Liu's avatar
      add license in file (#303) · d3051d75
      Chao Liu authored
      d3051d75
    • Chao Liu's avatar
      Absolute include path (#281) · d1db6a0c
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add client app example
      
      * update readme
      
      * delete obselete files
      
      * remove old client app
      
      * delete old file
      
      * cleaning
      
      * clean
      
      * remove half
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path for all examples
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * revert client app example
      
      * clean build
      
      * fix build
      
      * temporary disable client test on Jenkins
      
      * clean
      
      * clean
      
      * clean
      d1db6a0c
  5. 25 May, 2022 1 commit
  6. 24 May, 2022 1 commit
    • Qianfeng's avatar
      Overhaul to Reducton and its dependants (#237) · 63eee2d9
      Qianfeng authored
      * Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type
      
      * Update to host layer and host reduction
      
      * Merge and remove reduction kernels
      
      * Merge and remove reduction device interfaces and update pooling device interface
      
      * Merge and remove useless reduction device instances
      
      * Update to reduction profiler and reduction ctests
      
      * Update to reduction and pooling examples and add one reduction example
      
      * Change to reduction examples to let them testable by ctest
      
      * Add explicit pass checking for reduction and pooling examples
      
      * Explicit assignment of tensor shapes in example reduce_blockwise_two_call
      
      * Use atomic_add to repace atomicAdd and add atomic_add for double type
      
      * Add reduce ctest support for double data type
      
      * Replace to_int_vector() by using c++ std::vector::assign()
      
      * Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise
      
      * Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock
      
      * Add GetAtomicOperationZeroValue() support for AtomicMax
      
      * Tiny change to reduce example README.md
      
      * Fix some tiny issues due to branch merging
      
      * Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t
      
      * Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64
      
      * Renaming
      
      * Clean the header includings in device_reduce instances header files
      63eee2d9
  7. 20 May, 2022 1 commit
    • rocking5566's avatar
      Gemm reduce max (#209) · 0ffe956a
      rocking5566 authored
      
      
      * [What] Rename the example
      [Why] Prepare to add unary reduction
      
      * Add global oparation to the parameter
      
      * Add atomicmax
      
      * Fix compile error
      
      * Support atomicMax (hip library)
      
      * Rename the reduction example
      
      * Fix target name
      
      * use p_d1_grid as the indicator directly
      
      * Prevent performance issue. Let passthrough handle it.
      
      * Implement the function template the specialize the float2
      
      * No need to separate into two lines
      
      * Remove empty line
      
      * add comment
      
      * Fix compile error due to merge from develop
      
      * make the implementation of atomic_max / atomic_add explicit for each datatype
      
      * Refine typo
      
      * For future CI test
      
      * Fix compiler error in ckProfiler
      
      * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'
      
      * simply use remove_pointer
      
      * Rename type and var
      
      * Refine example
      
      * Modify reducemax example
      
      * Fix bug in reduction
      
      * Change initialize range
      
      * Implement F64 version of atomicMax
      
      * Move reduction  code together
      
      * Add buffer atomic_max
      
      * Fix coding style by clang-format
      
      * Integrate new api of DeviceGemmReduce_Xdl_CShuffle
      
      * Integrate Batch gemm reduction
      
      * Fix example
      
      * fix example
      
      * clean up
      
      * Fix batch gemm tensor operation
      
      * Fix coding style
      
      * Fix template augument
      
      * Fix clang format
      
      * Keep flexible of different stride for each D tensor
      
      * Fix compile error for ckProfiler
      
      * Fix typo
      
      * [What] Fix naming
      [Why] Prepare to add out elementop
      
      * Add DoutElementOp
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      0ffe956a
  8. 15 Apr, 2022 1 commit
    • Illia Silin's avatar
      Compile CK for all targets (#188) · 4221505d
      Illia Silin authored
      
      
      * compile ck for all targets
      
      * update the target criteria
      
      * change the target condition
      
      * fixed some typos
      
      * fixed missed file
      
      * revert changes in README
      
      * revert device_conv3d_fwd_xdl_...
      
      * update device_conv3d_fwd_xdl_...
      
      * update device_batched_gemm_reduce...
      
      * test the unused arguments fix
      
      * test the warning suppression
      
      * try suppress warnings in device_batched_gemm_reduce_xdl...
      
      * fix the last warnings
      
      * replace UNUSED with std::ignore
      
      * fix a typo
      
      * replaced std::ignore with ignore
      
      * add igonre header to common_header
      
      * refactor atomicAdd
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      4221505d
  9. 31 Mar, 2022 1 commit
    • Chao Liu's avatar
      Compile for gfx908 and gfx90a (#130) · cd167e49
      Chao Liu authored
      * adding compilation for multiple targets
      
      * fix build
      
      * clean
      
      * update Jekinsfile
      
      * update readme
      
      * update Jenkins
      
      * use ck::half_t instead of ushort for bf16
      
      * rename enum classes
      
      * clean
      
      * rename
      
      * clean
      cd167e49
  10. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  11. 04 Mar, 2022 1 commit
    • Jianfeng Yan's avatar
      Refactor threadwise copy using sfcurve (#101) · 0619ebf7
      Jianfeng Yan authored
      
      
      * add space_filling_curve
      
      * cleanup and move space_filling_curve into test
      
      * WIP: start refactoring threadwise_transfer_v1r3
      
      * threadwise_copy works but needs further refactoring
      
      * add some comments
      
      * add SpaceFillingCurve::GetIndices()
      
      * minor changes
      
      * removed GetIndices; refactored GetDstCoordinateResetStep
      
      * add DynamicBuffer::Transfer, but Add is not tested
      
      * rebased agaist develop
      
      * threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work
      
      * minor changes
      
      * refactored threadcopy v3r1, v2; removed old implementations
      
      * clang-format
      
      * cleanup
      
      * fix a typo in v6r3
      
      * format
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      0619ebf7
  12. 23 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  13. 12 Feb, 2022 1 commit
    • ltqin's avatar
      NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73) · 880fbee9
      ltqin authored
      
      
      * add fwd bf16 conv
      
      * change tunning parametor
      
      * add int8 for conv fwd
      
      * remove comments
      
      * change tunning parametor for int8
      
      * change init int8 example
      
      * add test for conv2d fwd
      
      * change device operation file pos because merge develop
      
      * fwd int8 use reference
      
      * test_conv_fwd use reference
      
      * add braket for if statement
      
      * rename fwd example name
      
      * remove StaticBufferOfVectorTypeV2
      
      * tweak example
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      880fbee9
  14. 03 Feb, 2022 1 commit
  15. 18 Nov, 2021 1 commit
  16. 27 Aug, 2021 2 commits
    • Chao Liu's avatar
      Misc fixes (#24) · 10bb8110
      Chao Liu authored
      * use cast_pointer_to_generic_address_space() in v6r1 kernel wrapper, DynamcBuffer and buffer_load take customized invalid-element-value, add buffer_load/store for fp64
      
      * use remove_cvref_t
      10bb8110
    • Qianfeng's avatar
      [SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction (#1108) · 9e80cdce
      Qianfeng authored
      
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * make inner product compatible on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities)
      
      * Some updates to dynamic composable kernel facility for the need of dynamic generic reduction
      
      * Update to generic reduction C++ host interface layer to support dynamic generic reduction
      
      * Update to remove tidy complaints in host interface layer
      
      * Change the unary operator form from void op(T &x) to T op(T x)
      
      * Update to pass single workspace pointer for all kernels (fix for OpenCL backend)
      
      * Use cppcheck-suppress to prevent some strange warnings
      
      * Re-use operator [] and () for DynamicBuffer and update to depending codes
      
      * Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers
      
      * [performance] Remove un-needed local buffer initialization
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      9e80cdce
  17. 25 Aug, 2021 1 commit
  18. 19 Aug, 2021 1 commit
    • Chao Liu's avatar
      Composable kernel init integration v3 (#1097) · 6fe3627a
      Chao Liu authored
      * Squashed 'src/composable_kernel/' content from commit f6edda61
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: f6edda61
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
      
      5781adf5 Update develop (#5) (#6)
      97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
      7b1ec41e refactor
      49c33aae refactor
      54b3e73d rename
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: 5781adf5
      
      
      
      * fix
      
      * refactor
      
      * remove online compilation from CK
      
      * refactor
      
      * fix
      
      * add ctest
      
      * add c-style pointer cast
      
      * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
      
      * fix clang warning suppression
      
      * tidy
      
      * suppress cppcheck
      
      * fix enum issue
      
      * revert chagnes to hip build
      
      * fix kernel filename
      
      * update CK build script
      
      * rename
      
      * rename
      
      * make innner product compatiable on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * refactor
      
      * refactor
      
      * change cmakelist
      
      * change ck common utility
      
      * fix
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      6fe3627a
  19. 16 Aug, 2021 1 commit
  20. 13 Aug, 2021 1 commit
  21. 10 Aug, 2021 2 commits
  22. 09 Aug, 2021 1 commit
  23. 27 Jul, 2021 1 commit
    • Chao Liu's avatar
      [MIOpen Downstream] Initial MIOpen integration (#52) · f63a23ac
      Chao Liu authored
      * update online kernel wrapper bundle all descriptors in a tuple
      
      * change __CONSTANT__ to CONSTANT
      
      * rename
      
      * adding tuning
      
      * added IsValidCompileParameter
      
      * reorginze
      
      * adding tunable for fp16 and int8
      
      * fix kernel compile warning and bug fixes
      
      * suppress warning about cast CONSTANT (address space 4) pointer
      
      * fix building issue
      f63a23ac
  24. 05 Jul, 2021 1 commit
    • Chao Liu's avatar
      DL GEMM fp32/fp16/int8 (#41) · b8b2d0a6
      Chao Liu authored
      * add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM
      
      * add kpack into fwd v4r5 nchw fp32
      b8b2d0a6
  25. 12 May, 2021 1 commit