• Illia Silin's avatar
    Update staging branch. (#706) · 72b7ae25
    Illia Silin authored
    
    
    * update daily build from rocm 5.4.3 to 5.5 (#693)
    
    * Fix grouped_gemm_splitk kernels on MI300. (#694)
    
    * replace amd_buffer_atomic_add with hip_atomic_add
    
    * fix grouped_gemm_splitk kernels on mi300
    
    * fix syntax
    
    * revert experimental atomic_add changes
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    
    * Fix the group of quantization_int8 kernels on MI300. (#695)
    
    * replace amd_buffer_atomic_add with hip_atomic_add
    
    * fix grouped_gemm_splitk kernels on mi300
    
    * fix syntax
    
    * revert experimental atomic_add changes
    
    * fix the group of kernels from ticket 723 on MI300
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    
    * Optimize bf16 conversion (#664)
    
    * Add TypeConvert class and start refactoring
    
    * Refactor TypeConvert as a struct
    
    * Get back to template functions type_convert
    
    * Add a type_convert_bf16_rtn, set rtz as default
    
    * Clean up
    
    * Add UnaryConvertPrecision struct for high-precision workloads
    
    * Format
    
    * Update type_convert to UnaryConvert on threadwise level
    
    * Update UnaryConvertPrecision
    
    * Format
    
    * Fix chmod
    
    * Add a flag to pick converion method
    
    * Format
    
    * Remove the added flag
    
    * Merge elementwise op with type conversion
    
    * Move type_convert to elemwise op, update the op
    
    * Update type_convert_precision -> bf16_convert_rtn
    
    * Clean up
    
    * Update comments
    
    * Update the CK_WORKAROUND_DENORM_FIX flag handling
    
    * Update the unneeded op to work but warn user
    
    * Remove the message
    
    * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
    
    * Format
    
    * Add missing include
    
    * Normalization/split k (#615)
    
    * Add contraction profiler and tests (#701)
    
    * Add contraction profiler and tests
    
    * Build and style fixes
    
    * Allow to use any elementwise operator for ref_contraction
    
    * Introduce profile_contraction_scale and profile_contraction_bilinear
    
    * Make ref_contraction generic and extend interface tests
    
    * Stylistic minor fixes
    
    * Extend test_contraction_interface
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
    Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    72b7ae25
Jenkinsfile 30.2 KB