"composable_kernel/include/utility/sequence.hpp" did not exist on "c82b833d8e76094a3702046d81872132d5c4b15a"
  1. 31 Jan, 2025 1 commit
    • arai713's avatar
      Codegen hipRTC compilation (#1579) · 2e3183af
      arai713 authored
      
      
      * updating codegen build for MIOpen access: adding .cmake for codegen component
      
      * updating CMake
      
      * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen
      
      * some more header guards
      
      * putting env file in header guard
      
      * cleaning up some includes
      
      * updated types file for hiprtc purposes
      
      * fixed types file: bit-wise/memcpy issue
      
      * updating multiple utility files to deal with standard header inclusion for hiprtc
      
      * added some more header guards in the utility files, replacing some standard header functionality
      
      * added some more header guards
      
      * fixing some conflicts in utility files, another round of header guards
      
      * fixing errors in data type file
      
      * resolved conflict errors in a few utility files
      
      * added header guards/replicated functionality in device files
      
      * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd
      
      * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp
      
      * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp
      
      * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl
      
      * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector
      
      * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit
      
      * added replicated functionality for standard header methods in utility files
      
      * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp
      
      * temp fix for namespace error in MIOpen
      
      * remove standard header usage in codegen device op
      
      * removed standard header usage in elementwise files, resolved namespace errors
      
      * formatting fix
      
      * changed codegen argument to ON for testing
      
      * temporarily removing codegen compiler flag for testing purposes
      
      * added codegen flag again, set default to ON
      
      * set codegen flag default back to OFF
      
      * replaced enable_if_t standard header usage in data_type.hpp
      
      * added some debug prints to pinpoint issues in MIOpen
      
      * added print outs to debug in MIOpen
      
      * removed debug print outs from device op
      
      * resolved stdexcept include error
      
      * formatting fix
      
      * adding includes to new fp8 file to resolve ck::enable_if_t errors
      
      * made changes to amd_wave_read_first_lane
      
      * updated functionality in type utility file
      
      * fixed end of file issue
      
      * resovled errors in type utility file, added functionality to array utility file
      
      * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x
      
      * formatting fix
      
      * replaced standard header usage in amd_ck_fp8 file
      
      * added include to random_gen file
      
      * removed and replicated standard header usage from data_type and type_convert files for fp8 changes
      
      * replicated standard unsigned integer types in random_gen
      
      * resolved comments from review: put calls to reinterpret_cast for size_t in header guards
      
      * updated/added copyright headers
      
      * removed duplicate header
      
      * fixed typo in header guard
      
      * updated copyright headers
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      2e3183af
  2. 26 Jul, 2023 1 commit
    • carlushuang's avatar
      initial stream-k implementation with example (#699) · e7dca79d
      carlushuang authored
      
      
      * initial stream-k implementation with example
      
      * fix unexpected change in err
      
      * improve a little bit performance by reorganize pipeline.
      
      * improve perf a little bit by swizzle block idx
      
      * add profiler
      
      * update example
      
      * fix spelling
      
      * shrink karg for streamk
      
      * support dynamic buffer using memory coherence glc_slc bit from template
      
      * control memory coherence while construct dynamic buffer
      
      * update reduction for streamk(not ready yet)
      
      * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
      
      * fix build issue
      
      * fix several bug
      
      * now result is correct, everything works (but has scratch)
      
      * remove scratch by manually reset coordinate
      
      * update device code
      
      * fix a bug in final reduce
      
      * fix something in example
      
      * update async memset
      
      * fix enum as camel case
      
      * modify coherence enum name
      
      * clean code and use atomic streamk by default
      
      * remove unused var
      
      * throw exception if have empty pointer
      
      * fix format
      
      * fix CI warning
      
      * fix type in init
      
      * modify CI error
      
      * filter out on gfx10+
      
      * restore changed example code
      
      ---------
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      e7dca79d
  3. 31 May, 2023 1 commit
  4. 25 Jun, 2022 2 commits
    • Chao Liu's avatar
      add license in file (#303) · d3051d75
      Chao Liu authored
      d3051d75
    • Chao Liu's avatar
      Absolute include path (#281) · d1db6a0c
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add client app example
      
      * update readme
      
      * delete obselete files
      
      * remove old client app
      
      * delete old file
      
      * cleaning
      
      * clean
      
      * remove half
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path for all examples
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * revert client app example
      
      * clean build
      
      * fix build
      
      * temporary disable client test on Jenkins
      
      * clean
      
      * clean
      
      * clean
      d1db6a0c
  5. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  6. 04 Mar, 2022 1 commit
    • ltqin's avatar
      NHWC conv 2d: bwd fp32/fp16/bfp16/int8, Device level tuning and host API (#92) · c254e5ab
      ltqin authored
      
      
      * start conv2d bwd api
      
      * kernel running
      
      * add bwd reference
      
      * change to no shuffle
      
      * fix bwd reference
      
      * pass verification
      
      * add Filter1x1Stride1Pad0 and start testing
      
      * change some tuning parameter
      
      * fix test error
      
      * add fp16 tuning parameter
      
      * add bf16 tuning parameter
      
      * add int8 tuning parameters
      
      * change fp32 tuning parameter
      
      * add bwd to profiler
      
      * fix bug for bwd profiler
      
      * fix ckProfiler bug
      
      * change conv2d_bwd_xdl to fp16
      
      * fix bug in comments
      
      * fix precompile id
      
      * fix enum conv name
      
      * chage _bwd_ to _bwd_data_
      
      * change conv2d_bwd example id
      
      * bwd to bwd data
      
      * fix prehead
      
      * fix MakeDefaultBlock2CTileMap ,import form merge develop
      
      * format bwd instance
      
      * bwd to bwd data
      
      * change name bwd to bwd data
      
      * change name bwd to bwd data in example
      
      * formate code
      
      * change conv2d bwd data id in example
      
      * rewrite readme for example
      
      * fix CalculateMagicNumbers about div zero
      
      * add workaround CK_WORKAROUND_SWDEV_325164
      
      * change test_conf2d_bwd_data show info
      
      * format
      
      * fix bug for workaround:CK_WORKAROUND_SWDEV_325164
      
      * formate tuning parameters
      
      * formate tuning parameters again
      
      * formate tuning parameters 3
      
      * formate tuning parameters 4
      
      * remove add function template
      
      * format
      
      * update comment
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      c254e5ab
  7. 23 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  8. 18 Nov, 2021 1 commit
  9. 23 Aug, 2021 1 commit
  10. 19 Aug, 2021 1 commit
    • Chao Liu's avatar
      Composable kernel init integration v3 (#1097) · 6fe3627a
      Chao Liu authored
      * Squashed 'src/composable_kernel/' content from commit f6edda61
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: f6edda61
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
      
      5781adf5 Update develop (#5) (#6)
      97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
      7b1ec41e refactor
      49c33aae refactor
      54b3e73d rename
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: 5781adf5
      
      
      
      * fix
      
      * refactor
      
      * remove online compilation from CK
      
      * refactor
      
      * fix
      
      * add ctest
      
      * add c-style pointer cast
      
      * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
      
      * fix clang warning suppression
      
      * tidy
      
      * suppress cppcheck
      
      * fix enum issue
      
      * revert chagnes to hip build
      
      * fix kernel filename
      
      * update CK build script
      
      * rename
      
      * rename
      
      * make innner product compatiable on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * refactor
      
      * refactor
      
      * change cmakelist
      
      * change ck common utility
      
      * fix
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      6fe3627a
  11. 27 Jul, 2021 1 commit
    • Chao Liu's avatar
      [MIOpen Downstream] Initial MIOpen integration (#52) · f63a23ac
      Chao Liu authored
      * update online kernel wrapper bundle all descriptors in a tuple
      
      * change __CONSTANT__ to CONSTANT
      
      * rename
      
      * adding tuning
      
      * added IsValidCompileParameter
      
      * reorginze
      
      * adding tunable for fp16 and int8
      
      * fix kernel compile warning and bug fixes
      
      * suppress warning about cast CONSTANT (address space 4) pointer
      
      * fix building issue
      f63a23ac
  12. 10 Jun, 2021 1 commit
  13. 13 Apr, 2021 1 commit