- 31 Jan, 2025 1 commit
-
-
arai713 authored
* updating codegen build for MIOpen access: adding .cmake for codegen component * updating CMake * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen * some more header guards * putting env file in header guard * cleaning up some includes * updated types file for hiprtc purposes * fixed types file: bit-wise/memcpy issue * updating multiple utility files to deal with standard header inclusion for hiprtc * added some more header guards in the utility files, replacing some standard header functionality * added some more header guards * fixing some conflicts in utility files, another round of header guards * fixing errors in data type file * resolved conflict errors in a few utility files * added header guards/replicated functionality in device files * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit * added replicated functionality for standard header methods in utility files * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp * temp fix for namespace error in MIOpen * remove standard header usage in codegen device op * removed standard header usage in elementwise files, resolved namespace errors * formatting fix * changed codegen argument to ON for testing * temporarily removing codegen compiler flag for testing purposes * added codegen flag again, set default to ON * set codegen flag default back to OFF * replaced enable_if_t standard header usage in data_type.hpp * added some debug prints to pinpoint issues in MIOpen * added print outs to debug in MIOpen * removed debug print outs from device op * resolved stdexcept include error * formatting fix * adding includes to new fp8 file to resolve ck::enable_if_t errors * made changes to amd_wave_read_first_lane * updated functionality in type utility file * fixed end of file issue * resovled errors in type utility file, added functionality to array utility file * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x * formatting fix * replaced standard header usage in amd_ck_fp8 file * added include to random_gen file * removed and replicated standard header usage from data_type and type_convert files for fp8 changes * replicated standard unsigned integer types in random_gen * resolved comments from review: put calls to reinterpret_cast for size_t in header guards * updated/added copyright headers * removed duplicate header * fixed typo in header guard * updated copyright headers --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 28 Jan, 2025 1 commit
-
-
darren-amd authored
* Change flag from CK_WORKAROUND_DENORM_FIX to CK_GFX90A_DENORM_WORKAROUND for more clarity. Also changed the definition macros to be more clear.
-
- 20 Jan, 2025 1 commit
-
-
lucbruni-amd authored
* Disable CK_TIME_KERNEL by Default, Add as CMake Variable * Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality. * Fix build error.
-
- 15 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes
-
- 18 Dec, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Add a rounding flag * Add FP6 and BF6 * Add tests Co-authored-by:
Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com> * Clean up --------- Co-authored-by:
Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>
-
- 11 Dec, 2024 1 commit
-
-
illsilin authored
-
- 09 Dec, 2024 2 commits
- 06 Dec, 2024 1 commit
-
-
root authored
-
- 27 Nov, 2024 1 commit
-
-
illsilin authored
-
- 22 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 13 Nov, 2024 1 commit
-
-
root authored
-
- 07 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 04 Nov, 2024 1 commit
-
-
Rostyslav Geyyer authored
-
- 21 Aug, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances
-
- 27 Jun, 2024 1 commit
-
-
Illia Silin authored
-
- 17 Jun, 2024 1 commit
-
-
zjing14 authored
-
- 10 May, 2024 1 commit
-
-
Illia Silin authored
* code clean-up * remove the profiling output samples
-
- 07 May, 2024 1 commit
-
-
Illia Silin authored
* enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 29 Apr, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Add a flag * Add flag check and messages --------- Co-authored-by:root <root@aus-g7-rogeyyer.amd.com>
-
- 02 Apr, 2024 1 commit
-
-
Illia Silin authored
* parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration
-
- 12 Mar, 2024 1 commit
-
-
illsilin authored
-
- 27 Feb, 2024 1 commit
-
-
aska-0096 authored
-
- 21 Feb, 2024 1 commit
-
-
illsilin authored
-
- 17 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 16 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 15 Feb, 2024 2 commits
- 14 Feb, 2024 1 commit
-
-
illsilin authored
-
- 02 Feb, 2024 1 commit
-
-
Illia Silin authored
* add support for navi2x and navi3x models * fix syntax * use common macro for different mi300 architectures
-
- 15 Jan, 2024 1 commit
-
-
Illia Silin authored
* add cppcheck to the CK CI * fix the path to CK source for cppcheck * fix the path to CK source for cppcheck one more time * fix the path to CK source for cppcheck third time * change the path to ck_cppcheck.log * install latest cppcheck from source * fix bug in ck.hpp and use 20 threads for cppcheck * create a switch to turn cppckeck on and off in CI
-
- 03 Dec, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions. Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering. Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
-
- 28 Nov, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Switch default f8 conversion to stochastic rounding * Refactor f8-related type_converts * Add an element-wise op
-
- 19 Oct, 2023 1 commit
-
-
Illia Silin authored
* apply the patch for dl kernels on gfx11 * build DL kernels on navi32 CI
-
- 23 Aug, 2023 1 commit
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
- 22 Aug, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Fix transform and instances for grouped conv bwd data * Add instances for small K and small C * Remove workaround after fix * Fix interface tests
-
- 14 Aug, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 03 Aug, 2023 2 commits
-
-
Bartlomiej Kocot authored
-
Bartlomiej Kocot authored
-
- 27 Jul, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-