- 15 May, 2023 1 commit
-
-
Illia Silin authored
* update daily build from rocm 5.4.3 to 5.5 (#693) * Fix grouped_gemm_splitk kernels on MI300. (#694) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by:
Jing Zhang <jizhan@amd.com> * Fix the group of quantization_int8 kernels on MI300. (#695) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by:
Jing Zhang <jizhan@amd.com> * Optimize bf16 conversion (#664) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include * Normalization/split k (#615) * Add contraction profiler and tests (#701) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface --------- Co-authored-by:
Jing Zhang <jizhan@amd.com> Co-authored-by:
Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by:
rocking <ChunYu.Lai@amd.com> Co-authored-by:
Bartłomiej Kocot <bartlomiejkocot98@gmail.com>
-
- 08 Mar, 2023 1 commit
-
-
Adam Osewski authored
* Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 09 Feb, 2023 1 commit
-
-
rocking5566 authored
* Add gemm + layernorm instance * Add ckProfiler * Add test * Add client example * Detect if user forger to set the workrspace * Use literal in the example * [What] use builtin function for sqrt [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt() * check gemm vaildity in IsSupportedArgument * Add more testcases * Merge duplicated folder in client example * Print more infomation * Use better kernel parameter for MS problem size * clang format * Add constexpr for if condition and remove redundant include * Remove cstdlib and add constexpr
-
- 25 Jan, 2023 1 commit
-
-
Qianfeng authored
* File renaming and class renaming for device element-wise operation * Add batchnorm-infer instances, external API and client example * Add batchnorm-infer profiler module and gtests * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer * Rename class and file due to conflict from device_elementwise_2d.hpp * Fix namespace in batcnnorm_infer_nhwc client example
-
- 18 Jan, 2023 1 commit
-
-
ltqin authored
* start add example * fix config * fix showinfo bug * add an elementop * change to padding * add xdl example * change elementwiseop * add instance * add instance to profiler * change file name * fix deive not support issue * add client example * fix client gemm_add_multiply name * change AddMultiply elementwiseop * fix elementwiseop * fix client example * fix addmultiply op * fix comments and fun name Co-authored-by:letaoqin <letaoqin@amd.com>
-
- 15 Dec, 2022 1 commit
-
-
Rostyslav Geyyer authored
Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535) * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances * Add gemm_add_fastgelu profiler impl * Add padding device_gemm_fastgelu_xdl_c_shuffle instances * Add gemm_fastgelu profiler impl
-
- 01 Dec, 2022 1 commit
-
-
Po Yen Chen authored
* Re-structure ckProfiler source files * Rename profiler.cpp to main.cpp * Modularize ckProfiler operations * Add description for profiler operations * Use longer name to avoid name collision * Use macro to delay expansion * Use std::move() to avoid object copying * Prohibit users from calling dtor * Use macro to eliminate redundant code * Make friend function hidden * Add missing include directive <iostream> * Fix wrong include directives * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-