1. 31 Jan, 2025 1 commit
    • arai713's avatar
      Codegen hipRTC compilation (#1579) · 2e3183af
      arai713 authored
      
      
      * updating codegen build for MIOpen access: adding .cmake for codegen component
      
      * updating CMake
      
      * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen
      
      * some more header guards
      
      * putting env file in header guard
      
      * cleaning up some includes
      
      * updated types file for hiprtc purposes
      
      * fixed types file: bit-wise/memcpy issue
      
      * updating multiple utility files to deal with standard header inclusion for hiprtc
      
      * added some more header guards in the utility files, replacing some standard header functionality
      
      * added some more header guards
      
      * fixing some conflicts in utility files, another round of header guards
      
      * fixing errors in data type file
      
      * resolved conflict errors in a few utility files
      
      * added header guards/replicated functionality in device files
      
      * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd
      
      * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp
      
      * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp
      
      * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl
      
      * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector
      
      * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit
      
      * added replicated functionality for standard header methods in utility files
      
      * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp
      
      * temp fix for namespace error in MIOpen
      
      * remove standard header usage in codegen device op
      
      * removed standard header usage in elementwise files, resolved namespace errors
      
      * formatting fix
      
      * changed codegen argument to ON for testing
      
      * temporarily removing codegen compiler flag for testing purposes
      
      * added codegen flag again, set default to ON
      
      * set codegen flag default back to OFF
      
      * replaced enable_if_t standard header usage in data_type.hpp
      
      * added some debug prints to pinpoint issues in MIOpen
      
      * added print outs to debug in MIOpen
      
      * removed debug print outs from device op
      
      * resolved stdexcept include error
      
      * formatting fix
      
      * adding includes to new fp8 file to resolve ck::enable_if_t errors
      
      * made changes to amd_wave_read_first_lane
      
      * updated functionality in type utility file
      
      * fixed end of file issue
      
      * resovled errors in type utility file, added functionality to array utility file
      
      * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x
      
      * formatting fix
      
      * replaced standard header usage in amd_ck_fp8 file
      
      * added include to random_gen file
      
      * removed and replicated standard header usage from data_type and type_convert files for fp8 changes
      
      * replicated standard unsigned integer types in random_gen
      
      * resolved comments from review: put calls to reinterpret_cast for size_t in header guards
      
      * updated/added copyright headers
      
      * removed duplicate header
      
      * fixed typo in header guard
      
      * updated copyright headers
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      2e3183af
  2. 28 Jan, 2025 1 commit
  3. 20 Jan, 2025 1 commit
  4. 15 Jan, 2025 1 commit
  5. 07 Nov, 2024 1 commit
  6. 21 Aug, 2024 1 commit
    • Rostyslav Geyyer's avatar
      Set RNE fp8 conversion as a default (#1458) · e20f20ef
      Rostyslav Geyyer authored
      * Set RNE fp8 conversion as a default
      
      * Update f8 tests
      
      * Disable failing test on gfx11
      
      * Update bf8 tests
      
      * Add a flag
      
      * Fix the flag
      
      * Raise flag for gfx10 as well
      
      * Temp commit for tolerance testing
      
      * Update tolerances
      e20f20ef
  7. 27 Jun, 2024 1 commit
  8. 17 Jun, 2024 1 commit
  9. 10 May, 2024 1 commit
  10. 07 May, 2024 1 commit
  11. 29 Apr, 2024 1 commit
  12. 02 Apr, 2024 1 commit
    • Illia Silin's avatar
      Split the instances by architecture. (#1223) · ae57e593
      Illia Silin authored
      * parse examples inside the add_example_executable function
      
      * fix the example 64 cmake file
      
      * add xdl flag to the gemm_bias_softmax_gemm_permute example
      
      * add filtering of tests based on architecture type
      
      * enable test_grouped_gemm for gfx9 only
      
      * enable test_transpose only for gfx9
      
      * only linnk test_transpose if it gets built
      
      * split the gemm instances by architectures
      
      * split gemm_bilinear,grouped_conv_bwd_weight instances by targets
      
      * split instances by architecture
      
      * split grouped_conv instances by architecture
      
      * fix clang format
      
      * fix the if-else logic in group_conv headers
      
      * small fix for grouped convolution instances
      
      * fix the grouped conv bwd weight dl instances
      
      * fix client examples
      
      * only enable client examples 3 and 4 on gfx9
      
      * set the gfx9 macro
      
      * make sure the architecture macros are set by cmake
      
      * use separate set of xdl/wmma flags for host code
      
      * sinmplify the main cmake file
      
      * add conv_fwd_bf8 instance declaration
      ae57e593
  13. 02 Feb, 2024 1 commit
  14. 15 Jan, 2024 1 commit
    • Illia Silin's avatar
      Add cppcheck to CK CI. (#1125) · e6d099c8
      Illia Silin authored
      * add cppcheck to the CK CI
      
      * fix the path to CK source for cppcheck
      
      * fix the path to CK source for cppcheck one more time
      
      * fix the path to CK source for cppcheck third time
      
      * change the path to ck_cppcheck.log
      
      * install latest cppcheck from source
      
      * fix bug in ck.hpp and use 20 threads for cppcheck
      
      * create a switch to turn cppckeck on and off in CI
      e6d099c8
  15. 03 Dec, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for double buffering in direct load GEMM kernel (#1052) · bc4bf9bd
      Bartlomiej Wroblewski authored
      This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions.
      
      Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering.
      
      Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
      bc4bf9bd
  16. 28 Nov, 2023 1 commit
  17. 19 Oct, 2023 1 commit
  18. 23 Aug, 2023 1 commit
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f
  19. 22 Aug, 2023 1 commit
  20. 14 Aug, 2023 1 commit
  21. 03 Aug, 2023 2 commits
  22. 27 Jul, 2023 1 commit
  23. 06 Jul, 2023 1 commit
    • Po Yen Chen's avatar
      Split GEMM instance library & enable pipeline v2 optimization (#783) · 850144a0
      Po Yen Chen authored
      * Move source file into sub-directories
      
      * Add missing include directive
      
      * Split DeviceGemmXdl<> fp16 instances
      
      * Fix format
      
      * Remove unnecessary CMakeLists.txt
      
      * Add macros to toggle new features
      
      * Remove debug message
      
      * Turn off GEMM v2 pipeline optimization by default
      
      * Fix format
      
      * Extract duplicated string as list
      
      * Enlarge indent in CMakeLists.txt
      850144a0
  24. 21 Jun, 2023 1 commit
  25. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee
  26. 31 May, 2023 1 commit
  27. 04 May, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Optimize bf16 conversion (#664) · b076a02a
      Rostyslav Geyyer authored
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      b076a02a
  28. 28 Apr, 2023 1 commit
  29. 11 Apr, 2023 1 commit
  30. 30 Mar, 2023 1 commit
  31. 20 Mar, 2023 1 commit
  32. 09 Mar, 2023 1 commit
  33. 27 Feb, 2023 1 commit
  34. 15 Feb, 2023 1 commit
  35. 18 Jan, 2023 1 commit
    • Raman R jana's avatar
      Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760
      Raman R jana authored
      
      
      * wavelet gemm programming model support for CK
      
      * GEMM pipeline update for wavelet progrmmaing model
      
      * Updated wavelet programming pipeline
      
      * fixes for global-write for math-wave
      
      * fixed bug in global writes
      
      * Updated comments for better readability
      
      * fixed clang format errors
      
      * added block_lds without barrier sync
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor
      
      * prototype
      
      4 layouts
      
      fix default stride
      
      all problem sizes
      
      tidy
      
      move file
      
      update build script
      
      restore old file
      
      fix build
      
      * refactor standalone test to use gemm test harness
      
      * simplify gemm test
      
      * update build script
      
      * remove redundant
      
      * early return when cmd arg doesn't match
      
      * tidy
      
      * report failure when result not validated
      
      * tidy
      
      * Add comment depicting B2C mapping pattern.
      
      * Formatting & comments.
      
      * Comparison with custom B2C mapping pattern.
      
      * Example for wavelet gemm.
      
      * Add wavelet to Gemm standalone test.
      
      * Remove debug code.
      
      * Remove dangling #endif directive.
      
      Co-authored-by: root <Raman Jana>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      1cfa8760
  36. 12 Jan, 2023 1 commit
  37. 02 Dec, 2022 1 commit
  38. 17 Nov, 2022 1 commit
  39. 02 Nov, 2022 1 commit