1. 28 Jan, 2025 1 commit
  2. 24 Jan, 2025 1 commit
  3. 20 Jan, 2025 1 commit
  4. 15 Jan, 2025 1 commit
  5. 07 Nov, 2024 1 commit
  6. 30 Sep, 2024 1 commit
  7. 25 Sep, 2024 1 commit
  8. 21 Aug, 2024 1 commit
    • Rostyslav Geyyer's avatar
      Set RNE fp8 conversion as a default (#1458) · e20f20ef
      Rostyslav Geyyer authored
      * Set RNE fp8 conversion as a default
      
      * Update f8 tests
      
      * Disable failing test on gfx11
      
      * Update bf8 tests
      
      * Add a flag
      
      * Fix the flag
      
      * Raise flag for gfx10 as well
      
      * Temp commit for tolerance testing
      
      * Update tolerances
      e20f20ef
  9. 27 Jun, 2024 1 commit
  10. 17 Jun, 2024 1 commit
  11. 10 May, 2024 1 commit
  12. 07 May, 2024 1 commit
  13. 29 Apr, 2024 1 commit
  14. 02 Apr, 2024 1 commit
    • Illia Silin's avatar
      Split the instances by architecture. (#1223) · ae57e593
      Illia Silin authored
      * parse examples inside the add_example_executable function
      
      * fix the example 64 cmake file
      
      * add xdl flag to the gemm_bias_softmax_gemm_permute example
      
      * add filtering of tests based on architecture type
      
      * enable test_grouped_gemm for gfx9 only
      
      * enable test_transpose only for gfx9
      
      * only linnk test_transpose if it gets built
      
      * split the gemm instances by architectures
      
      * split gemm_bilinear,grouped_conv_bwd_weight instances by targets
      
      * split instances by architecture
      
      * split grouped_conv instances by architecture
      
      * fix clang format
      
      * fix the if-else logic in group_conv headers
      
      * small fix for grouped convolution instances
      
      * fix the grouped conv bwd weight dl instances
      
      * fix client examples
      
      * only enable client examples 3 and 4 on gfx9
      
      * set the gfx9 macro
      
      * make sure the architecture macros are set by cmake
      
      * use separate set of xdl/wmma flags for host code
      
      * sinmplify the main cmake file
      
      * add conv_fwd_bf8 instance declaration
      ae57e593
  15. 02 Feb, 2024 1 commit
  16. 15 Jan, 2024 1 commit
    • Illia Silin's avatar
      Add cppcheck to CK CI. (#1125) · e6d099c8
      Illia Silin authored
      * add cppcheck to the CK CI
      
      * fix the path to CK source for cppcheck
      
      * fix the path to CK source for cppcheck one more time
      
      * fix the path to CK source for cppcheck third time
      
      * change the path to ck_cppcheck.log
      
      * install latest cppcheck from source
      
      * fix bug in ck.hpp and use 20 threads for cppcheck
      
      * create a switch to turn cppckeck on and off in CI
      e6d099c8
  17. 03 Dec, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for double buffering in direct load GEMM kernel (#1052) · bc4bf9bd
      Bartlomiej Wroblewski authored
      This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions.
      
      Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering.
      
      Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
      bc4bf9bd
  18. 28 Nov, 2023 1 commit
  19. 19 Oct, 2023 1 commit
  20. 23 Aug, 2023 1 commit
    • Jun Liu's avatar
      [HotFix] add config and version files to pass on build info (#856) · c8a8385f
      Jun Liu authored
      * experiment with config file
      
      * experiment with version.h config
      
      * add more info to version.h
      
      * minor updates
      
      * minor updates
      
      * fix case where DTYPE is not used
      
      * large amount of files but minor changes
      
      * remove white space
      
      * minor changes to add more MACROs
      
      * fix cmakedefine01
      
      * fix issue with CK internal conflict
      
      * fix define and define value
      
      * fix clang-format
      
      * fix formatting issue
      
      * experiment with cmake
      
      * clang format v12 to be consistent with miopen
      
      * avoid clang-format for config file
      c8a8385f
  21. 22 Aug, 2023 1 commit
  22. 14 Aug, 2023 1 commit
  23. 03 Aug, 2023 2 commits
  24. 27 Jul, 2023 1 commit
  25. 06 Jul, 2023 1 commit
    • Po Yen Chen's avatar
      Split GEMM instance library & enable pipeline v2 optimization (#783) · 850144a0
      Po Yen Chen authored
      * Move source file into sub-directories
      
      * Add missing include directive
      
      * Split DeviceGemmXdl<> fp16 instances
      
      * Fix format
      
      * Remove unnecessary CMakeLists.txt
      
      * Add macros to toggle new features
      
      * Remove debug message
      
      * Turn off GEMM v2 pipeline optimization by default
      
      * Fix format
      
      * Extract duplicated string as list
      
      * Enlarge indent in CMakeLists.txt
      850144a0
  26. 21 Jun, 2023 1 commit
  27. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee
  28. 31 May, 2023 1 commit
  29. 04 May, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Optimize bf16 conversion (#664) · b076a02a
      Rostyslav Geyyer authored
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      b076a02a
  30. 28 Apr, 2023 1 commit
  31. 11 Apr, 2023 1 commit
  32. 30 Mar, 2023 1 commit
  33. 20 Mar, 2023 1 commit
  34. 09 Mar, 2023 1 commit
  35. 27 Feb, 2023 1 commit
  36. 15 Feb, 2023 1 commit
  37. 18 Jan, 2023 1 commit
    • Raman R jana's avatar
      Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760
      Raman R jana authored
      
      
      * wavelet gemm programming model support for CK
      
      * GEMM pipeline update for wavelet progrmmaing model
      
      * Updated wavelet programming pipeline
      
      * fixes for global-write for math-wave
      
      * fixed bug in global writes
      
      * Updated comments for better readability
      
      * fixed clang format errors
      
      * added block_lds without barrier sync
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor
      
      * prototype
      
      4 layouts
      
      fix default stride
      
      all problem sizes
      
      tidy
      
      move file
      
      update build script
      
      restore old file
      
      fix build
      
      * refactor standalone test to use gemm test harness
      
      * simplify gemm test
      
      * update build script
      
      * remove redundant
      
      * early return when cmd arg doesn't match
      
      * tidy
      
      * report failure when result not validated
      
      * tidy
      
      * Add comment depicting B2C mapping pattern.
      
      * Formatting & comments.
      
      * Comparison with custom B2C mapping pattern.
      
      * Example for wavelet gemm.
      
      * Add wavelet to Gemm standalone test.
      
      * Remove debug code.
      
      * Remove dangling #endif directive.
      
      Co-authored-by: root <Raman Jana>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      1cfa8760
  38. 12 Jan, 2023 1 commit
  39. 02 Dec, 2022 1 commit