1. 09 Apr, 2024 1 commit
    • Bartłomiej Kocot's avatar
      Extend support for contraction 6D (#1207) · ced5af16
      Bartłomiej Kocot authored
      * Extend support for contraction up to 5D
      
      * Extend contraction bilinear instances
      
      * Fix interface test
      
      * Add 6d support, remove 3d,4d,5d
      
      * Fixes
      
      * Fix readme
      
      * Make defualt dim for contraction instances
      ced5af16
  2. 02 Apr, 2024 1 commit
    • Illia Silin's avatar
      Split the instances by architecture. (#1223) · ae57e593
      Illia Silin authored
      * parse examples inside the add_example_executable function
      
      * fix the example 64 cmake file
      
      * add xdl flag to the gemm_bias_softmax_gemm_permute example
      
      * add filtering of tests based on architecture type
      
      * enable test_grouped_gemm for gfx9 only
      
      * enable test_transpose only for gfx9
      
      * only linnk test_transpose if it gets built
      
      * split the gemm instances by architectures
      
      * split gemm_bilinear,grouped_conv_bwd_weight instances by targets
      
      * split instances by architecture
      
      * split grouped_conv instances by architecture
      
      * fix clang format
      
      * fix the if-else logic in group_conv headers
      
      * small fix for grouped convolution instances
      
      * fix the grouped conv bwd weight dl instances
      
      * fix client examples
      
      * only enable client examples 3 and 4 on gfx9
      
      * set the gfx9 macro
      
      * make sure the architecture macros are set by cmake
      
      * use separate set of xdl/wmma flags for host code
      
      * sinmplify the main cmake file
      
      * add conv_fwd_bf8 instance declaration
      ae57e593
  3. 02 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#973) · 4ef704d8
      Bartlomiej Wroblewski authored
      
      
      * Add support for mixed precision in contraction scale and bilinear (#936)
      
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      
      * Make the ComputeDataType an optional argument in instance
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      4ef704d8
  4. 05 Oct, 2023 1 commit
  5. 29 Sep, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#936) · f0748506
      Bartlomiej Wroblewski authored
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      f0748506
  6. 22 Aug, 2023 1 commit
  7. 15 May, 2023 2 commits
    • Illia Silin's avatar
      Update staging branch. (#706) · 72b7ae25
      Illia Silin authored
      
      
      * update daily build from rocm 5.4.3 to 5.5 (#693)
      
      * Fix grouped_gemm_splitk kernels on MI300. (#694)
      
      * replace amd_buffer_atomic_add with hip_atomic_add
      
      * fix grouped_gemm_splitk kernels on mi300
      
      * fix syntax
      
      * revert experimental atomic_add changes
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      
      * Fix the group of quantization_int8 kernels on MI300. (#695)
      
      * replace amd_buffer_atomic_add with hip_atomic_add
      
      * fix grouped_gemm_splitk kernels on mi300
      
      * fix syntax
      
      * revert experimental atomic_add changes
      
      * fix the group of kernels from ticket 723 on MI300
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      
      * Optimize bf16 conversion (#664)
      
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      
      * Normalization/split k (#615)
      
      * Add contraction profiler and tests (#701)
      
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      72b7ae25
    • Bartłomiej Kocot's avatar
      Add contraction profiler and tests (#701) · 642d5e91
      Bartłomiej Kocot authored
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      642d5e91