- 15 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes
-
- 18 Dec, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Add a rounding flag * Add FP6 and BF6 * Add tests Co-authored-by:
Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com> * Clean up --------- Co-authored-by:
Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>
-
- 11 Dec, 2024 1 commit
-
-
illsilin authored
-
- 09 Dec, 2024 2 commits
- 06 Dec, 2024 1 commit
-
-
root authored
-
- 27 Nov, 2024 1 commit
-
-
illsilin authored
-
- 22 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 13 Nov, 2024 1 commit
-
-
root authored
-
- 07 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 04 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 21 Aug, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances
-
- 27 Jun, 2024 1 commit
-
-
Illia Silin authored
-
- 17 Jun, 2024 1 commit
-
-
zjing14 authored
-
- 10 May, 2024 1 commit
-
-
Illia Silin authored
* code clean-up * remove the profiling output samples
-
- 07 May, 2024 1 commit
-
-
Illia Silin authored
* enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 29 Apr, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Add a flag * Add flag check and messages --------- Co-authored-by:root <root@aus-g7-rogeyyer.amd.com>
-
- 02 Apr, 2024 1 commit
-
-
Illia Silin authored
* parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration
-
- 12 Mar, 2024 1 commit
-
-
illsilin authored
-
- 27 Feb, 2024 1 commit
-
-
aska-0096 authored
-
- 21 Feb, 2024 1 commit
-
-
illsilin authored
-
- 17 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 16 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 15 Feb, 2024 2 commits
- 14 Feb, 2024 1 commit
-
-
illsilin authored
-
- 02 Feb, 2024 1 commit
-
-
Illia Silin authored
* add support for navi2x and navi3x models * fix syntax * use common macro for different mi300 architectures
-
- 15 Jan, 2024 1 commit
-
-
Illia Silin authored
* add cppcheck to the CK CI * fix the path to CK source for cppcheck * fix the path to CK source for cppcheck one more time * fix the path to CK source for cppcheck third time * change the path to ck_cppcheck.log * install latest cppcheck from source * fix bug in ck.hpp and use 20 threads for cppcheck * create a switch to turn cppckeck on and off in CI
-
- 03 Dec, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions. Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering. Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
-
- 28 Nov, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Switch default f8 conversion to stochastic rounding * Refactor f8-related type_converts * Add an element-wise op
-
- 19 Oct, 2023 1 commit
-
-
Illia Silin authored
* apply the patch for dl kernels on gfx11 * build DL kernels on navi32 CI
-
- 23 Aug, 2023 1 commit
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
- 22 Aug, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Fix transform and instances for grouped conv bwd data * Add instances for small K and small C * Remove workaround after fix * Fix interface tests
-
- 14 Aug, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 03 Aug, 2023 2 commits
-
-
Bartlomiej Kocot authored
-
Bartlomiej Kocot authored
-
- 27 Jul, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-
- 06 Jul, 2023 1 commit
-
-
Po Yen Chen authored
* Move source file into sub-directories * Add missing include directive * Split DeviceGemmXdl<> fp16 instances * Fix format * Remove unnecessary CMakeLists.txt * Add macros to toggle new features * Remove debug message * Turn off GEMM v2 pipeline optimization by default * Fix format * Extract duplicated string as list * Enlarge indent in CMakeLists.txt
-
- 21 Jun, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message
-
- 15 Jun, 2023 1 commit
-
-
Illia Silin authored
* enable gfx941/942 targets * fix clang format * fix the cmake logic for multiple targets * fix cmake syntax for looping over targets * add gfx941/942 support for gemm_xdl instances
-